Korea AI Safety Institute Publishes First Detailed AI Model Safety Assessment

2026-06-21 10:20

Favorite

en.Wedoany.com Reported - Since its establishment in November 2024, the Korea Artificial Intelligence Safety Institute (AISI) has gradually begun disclosing its previously unpublished artificial intelligence (AI) model safety assessment results. The institute aims to strengthen a transparent evaluation system by publishing safety assessment conclusions for major domestic and international AI models (including open-source models) in a more detailed manner.

According to industry sources on the 19th, AISI released a "Detailed Results Report on the Joint Test for AI Agent Data Leakage Risks" completed with Singapore's AISI in the first half of this year via its official website on the 15th. This report specifically reveals scenarios where AI agents, while executing routine instructions, may erroneously query, transmit, and leak sensitive information due to misjudgment, leading to critical errors.

This Korea-Singapore joint report is the first of its kind to be made public, containing not only an evaluation checklist but also detailed numerical values and results. The global model names covered in the report are anonymized as A, B, C, etc., but quantitative assessments confirmed multiple instances of "cognition-behavior inconsistency," where even if an agent's task execution capability is excellent, its ability to handle data securely cannot be guaranteed. Additionally, the report confirmed risk factors unique to agent AI, such as claiming to have completed a task without actually running the real tools (i.e., the "false reporting" hallucination phenomenon).

Key experimental results in the Korea AISI test environment (Photo: Screenshot of the Korea-Singapore AISI Joint Report)

In fact, this is the first time AISI has released a report containing detailed numerical values and recommendations. Previously, AISI's limited scope of public disclosure for AI model safety assessment results made it difficult not only to confirm the named assessment results of individual models but also to verify the content. The "Safety Assessment Performance of 42 AI Models" published by AISI last month covered 42 major domestic and international models verified over approximately 16 months from January 2025 to April 2026, but only disclosed a checklist primarily consisting of model names and assessment items, lacking specific data.

Except for Kakao's "Kanana," the first domestic AI safety assessment case jointly published by AISI and the Korea Telecommunications Technology Association (TTA), the safety levels or detailed indicators of most models were not disclosed. External doubts about AISI's performance and role largely stem from the institute's overly cautious approach to publishing its core safety assessment results. Industry analysis suggests this is mainly due to concerns about the burden of exposing the capability gap between global tech giant models and domestic models, such as those from the "Independent AI Foundation Model" development project led by the Ministry of Science and ICT.

AISI Director Kim Myung-joo stated, "For future safety assessments, we plan to disclose as much content as possible, provided the target company does not object." However, he added, "Depending on company requests, some model names may be anonymized."

AISI, as an affiliated organization of the Electronics and Telecommunications Research Institute (ETRI) under the Ministry of Science and ICT, is the representative body in South Korea specializing in cooperation with AI safety institutes or related organizations in various countries. AISI's recent series of partnerships with the world's top three AI developers—Google DeepMind, OpenAI, and Anthropic—is expected to become a core driving force in building a global AI safety network.

Regarding Google DeepMind, based on the business agreement (MOU) signed by the Ministry of Science and ICT in April, discussions will continue on safety framework construction and testing methodologies. With OpenAI, AISI directly signed an MOU on the 17th, agreeing to share safety assessment methodologies and benchmark knowledge for high-risk areas. In particular, AISI will apply its self-built Korean benchmark data to jointly conduct hallucination and safety assessments from a Korean perspective and collaborate on establishing international standards.

With Anthropic, in conjunction with the MOU signed by the Ministry of Science and ICT on the 18th, red team evaluations of autonomous AI agents and model safety and misuse risk assessments in the Korean context will be pursued. Additionally, information on AI vulnerabilities and cyber threats in key sectors such as finance will be rapidly shared, fostering substantive cooperation in the cybersecurity field.

Director Kim Myung-joo emphasized, "We will continue to expand the foundation of cooperation with global tech giants like Google DeepMind, OpenAI, and Anthropic, scientifically verify the risks of the most advanced models, and lead the development of a Korean-style evaluation system that is internationally recognized."

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com