en.Wedoany.com Reported - The South Korean government recently announced plans to build a National Manufacturing Data Library and digitize the tacit knowledge of skilled workers. However, Physical AI companies point out that key issues such as the collection methods, standardization criteria, and usage rights for raw industrial site data need to be considered simultaneously.

The success or failure of manufacturing AI policy ultimately hinges on establishing a "data circulation structure." If Physical AI data is simply stored in a repository, it will be difficult to apply in industrial settings. A system needs to be built: collect worker motion and process data, clean and validate it for simulation and model training, and then feed robot validation results back as data. Therefore, the key going forward is how to connect the government-led Manufacturing AX database with the data utilization system required by the Physical AI industry. Only by creating an ecosystem that spans data collection, model development, robot field application, and validation feedback can the manufacturing database become a public infrastructure for the Physical AI industry, rather than a mere repository.
At the "National Report on Korea's Three Major Super Projects for a Great Leap Forward" held at the Cheong Wa Dae Guest House on June 29, the government stated its intention to foster Manufacturing AI and Physical AI as national strategic industries. Core measures include building the National Manufacturing Data Library, digitizing the tacit knowledge of skilled workers, and developing a Physical AI Foundation Model. Among these, the project to convert skilled workers' tacit knowledge into data has been pre-allocated 48 billion won in the 2026 supplementary budget.
Physical AI companies generally agree that data is the primary bottleneck. While Graphics Processing Units (GPUs) and computing infrastructure are important, for robots to operate effectively in industrial settings, high-quality raw data encompassing worker motions and process conditions must first be obtained. Unlike Large Language Models (LLMs), Physical AI must handle real-world issues like force, friction, contact, failure, and safety. For robots to grasp parts, tighten screws, and move items in factories, motion data specific to different sites and industries is required.
Yeom Woon-seol, CEO of AIRobot, stated, "For startups, GPUs are important, but the biggest bottleneck is data. Without data, you can't create motion models for robots, and without motion models, robots can't act according to customer requirements." He added that this ultimately makes robots difficult to sell.
The problem is that manufacturing site work varies by industry. The required motions in steel, automotive parts, food, logistics, and assembly industries may seem similar but are actually different. For example, the bread-making process alone involves different actions like dividing dough, piping it onto baking paper, and operating machinery. It is difficult for a single robot company or AI data company to directly acquire motion data from all industries. Data collection methods also vary, some based on vision, others using master-slave structures or teleoperation, easily generating data optimized for specific robot hands or platforms.
CEO Yeom Woon-seol explained, "Data acquired with a specific robot hand is optimized for that robot. For other companies to use it, they need to re-annotate and process it, essentially doing the work twice." He suggested that an Egocentric approach, where cameras are mounted on workers to capture hand motions from a first-person perspective, could be an alternative. If videos of hand motions from shoemakers, chefs, and skilled workers are obtained, multiple robot companies could reprocess and utilize them according to their own robots.
Jang Jun-hyun, Vice President of Tomorrow Robotics, emphasized the importance of data standards. "While various data standards exist, they are not yet unified. If companies and institutions create data in different formats, mutual compatibility will be difficult, so commonly usable data standards are needed." He explained that first-person perspective data is sometimes effective but can be costly, while third-person perspective data is sufficient for certain tasks. The key lies in determining the unit and format for combining angle, length, joint information, force information, video information, and work context.
Wirobotics believes the core lies in data quality and design, not quantity. A company representative stated, "Manufacturing site data is very useful and essential for Physical AI development. But the important thing is not simply collecting vast amounts of data, but carefully designing the type of data collected, collection standards, and data format from the outset based on the work content, standardizing it into meaningful, high-quality data."
Accessibility is also a concern. If data is concentrated in large enterprise manufacturing sites or data factories, startups and robot-specialized companies may find it difficult to utilize due to security and intellectual property issues. A Wirobotics representative pointed out, "Data from large enterprises with manufacturing facilities or the data factories they build may be difficult for startups or robot-specialized companies to access due to security and IP issues. The National Manufacturing Data Library should be substantially open to robot-specialized companies." Tomorrow Robotics also stressed the importance of a data sharing structure. Vice President Jang Jun-hyun said, "While creating spaces or institutions capable of mass-producing data to generate high-quality data is important, it is even more critical that this data can be used collectively."
Lessons from past AI learning data construction projects are also worth noting. The government built large-scale learning datasets through platforms like AI Hub, but the industry has consistently pointed out that "even if data exists, it is difficult for actual companies to use it directly." Physical AI data is more complex than simple images or text because it must simultaneously include worker motions, robot joint values, force/contact information, work environments, and failure cases.
The industry generally views the government's goal of "developing an indigenous Physical AI Foundation Model within 3 years" positively. A Wirobotics representative stated, "I believe it is possible to develop a first-generation Physical AI model with meaningful performance in specific domains within 3 years." A MindAI representative also said, "Developing an indigenous Physical AI Foundation Model within 3 years is entirely possible, and we will see results starting this year." Vice President Jang Jun-hyun, from a Sovereign AI perspective, noted, "A robot foundation model is equivalent to the brain of a humanoid robot. The brain of humanoid robots working in Korean factories cannot rely solely on models from China or the US. If foreign 'brains' are used, current process data could leak out."
Recent controversies over access restrictions to Anthropic's Mythos5 and Fable5 have heightened this awareness. The US government restricted access to advanced AI models citing national security and export controls, and while later relaxed, it still demonstrated the risks of dependence on overseas frontier models. In fields where core data flows, such as manufacturing, defense, security, and public services, AI model sovereignty is no longer just a slogan for technological self-reliance. AIRobot CEO Yeom Woon-seol also acknowledged the necessity of an indigenous foundation model, stating, "Even to break the monopoly ecosystem position, an indigenous foundation model is absolutely necessary."









