en.Wedoany.com Report on Mar 25th, during the Boao Forum, Wang Xiaogang, Co-founder and Executive Director of SenseTime and Chairman of Da Xiao Robot, stated at the "Advancement and Leap of Humanoid Robots" forum that the humanoid robot field still needs about two years to reach a disruptive breakthrough and explosive inflection point similar to ChatGPT—the "ChatGPT Moment."
Wang Xiaogang pointed out that the current data volume in the humanoid robot industry is at the level of about 100,000 hours, which is still insufficient to support a breakthrough in general capabilities. Last year, his company proposed using environmental materials to strive to increase the data volume to 10 million hours within the next two years, by 2027, achieving a growth of two orders of magnitude. On this basis, combined with the support of world models and simulation technology, the ultimate goal is to reach the "one-hour level"—meaning the robot can operate stably in complex environments for one continuous hour. He believes that only at that point might the "ChatGPT Moment" for humanoid robots arrive.
The so-called "ChatGPT Moment" refers to a technological breakthrough and industrial explosion inflection point in the humanoid robot field, similar to what happened with ChatGPT. The birth of ChatGPT marked a qualitative leap in the general conversational capabilities of large language models, while the "ChatGPT Moment" for humanoid robots would mean that robots possess the ability to autonomously perform complex tasks in open environments, thereby truly moving out of the laboratory and into large-scale applications.
Wang Xiaogang's analysis reveals the core bottleneck in current humanoid robot development: data scarcity. Unlike language models, which can be trained using vast amounts of text data from the internet, robot training requires real physical interaction data, which is costly to collect and limited in scale. The collection of environmental materials, generation of simulation data, and construction of world models are precisely aimed at filling this data gap.
He also emphasized that increasing data volume is only the foundation; the combination of world models and simulation technology is equally crucial. Accelerating training through simulation environments and endowing robots with an understanding of physical laws via world models are necessary to translate data accumulation into true generalization capabilities. Only when the data volume reaches tens of millions of hours and robots can work continuously in real environments for over an hour can humanoid robots potentially achieve the leap from "demonstration level" to "practical level."
Wang Xiaogang's judgment provides a clear timeline for the industry. In the next two years, how to rapidly accumulate high-quality data and close the loop between simulation and reality will be the core focus of competition for global humanoid robot players. Once the "ChatGPT Moment" truly arrives, humanoid robots are expected to, like large models, usher in a new industrial cycle.









