South Korea's NIA Invests 2.9 Billion Won to Build Unified AI Training Data Provision System
2026-07-02 09:18
Favorite

en.Wedoany.com Reported - The National Information Society Agency (NIA) of South Korea will invest approximately 2.9 billion won to launch the construction of a "Unified AI Training Data Provision System," aiming to establish a unified platform for providing artificial intelligence (AI) training data that is currently dispersed at the national level.

The National Information Society Agency (NIA) of South Korea will build a 'Unified AI Training Data Provision System' to enable the discovery, registration, quality management, opening, search, and utilization of dispersed AI training data on a single platform. This initiative aims to address the problem that data owned by government-funded projects, private sectors, and research institutions is often confined to internal use due to insufficient opening processes and support systems. (AI-generated image)

With the rapid proliferation of next-generation AI technologies such as generative AI, multimodal AI, and agentic AI, acquiring large-scale, high-quality training data has become a core challenge. However, data collected and processed in government-funded projects is often not open for external use or reuse, and data owned by private companies and research institutions is mostly limited to internal use due to insufficient opening processes and support systems. This project is designed to address this pain point.

The core of this project is to build a system that comprehensively supports the discovery and registration, quality management and de-identification, opening and provision, as well as search and associative utilization of AI training data. The project budget is 2.87918 billion won, with a construction period of 120 days from the date of contract signing.

In terms of specific functions, the system will establish a full registration management process for training data, from application acceptance, suitability review, approval, supplementation, to disposal. For registered data, it will also enable tracking of the full lifecycle status, version, and change history from generation, annotation, distribution, update, to disposal.

For data exploration functions, a natural language semantic search engine based on a vector database will be developed, and through a single-window interface, it will enable integrated search of external public and private data catalogs. Additionally, machine-readable service interfaces will be provided, allowing external portals or AI agents to query and utilize metadata information and usage conditions of datasets.

The functions and data of the AI Hub platform currently operated by NIA will also be migrated to the new system. The migration targets include data, metadata, historical information, and statistical information managed in the AI Hub system and related projects. During the migration process, consistency verification, duplicate removal, and error correction will be performed.

According to the plan, this system will not only serve as a data provision portal but also as a common foundational platform for a data utilization ecosystem, enabling data search, import, combination, learning, and result management. NIA's long-term goal is to build it into a core infrastructure covering the entire lifecycle of the AI industry ecosystem, with participation from private companies, research institutions, public agencies, and other stakeholders.