NVIDIA and Beijing Academy of Artificial Intelligence Drive World Models, Data Infrastructure Competition Heats Up
2026-07-04 14:52
Favorite

en.Wedoany.com Reported - NVIDIA and the Beijing Academy of Artificial Intelligence (BAAI) recently signaled in tandem: AI is moving from the digital world into the physical world, with data, world models, and simulation becoming core drivers of this process. At its annual shareholder meeting on June 25, 2026, NVIDIA CEO Jensen Huang stated that AI data centers are "factories" that manufacture tokens, each of which can be transformed into code, answers, designs, actions, and services. Customers are not just buying servers, but AI factories capable of generating revenue. He emphasized that physical AI represents the next wave of growth, with robots, cars, and factories becoming intelligent agents in the real world. NVIDIA will train models in AI factories, simulate them using Omniverse, and deploy them to physical devices via platforms like Jetson. Almost concurrently, BAAI identified world models as a key consensus direction towards achieving general artificial intelligence (AGI), proposing a shift from "predicting the next word" to "predicting the next state of the world."

The core of world models is enabling AI to perceive, understand, and reason about the time, space, and physical laws of the physical world, encompassing full-modality data and possessing proactive interaction capabilities. Stanford Professor Fei-Fei Li pointed out that spatial intelligence is the ability of machines to perceive, reason, and act in 3D space and time. Her startup, World Labs, recently completed a $1 billion funding round, reaching a valuation of $5 billion. Li asserts: large models teach machines to read and write, while spatial intelligence teaches them to observe and build.

The global world model track is rapidly becoming crowded, covering a full spectrum from outdoor autonomous driving and urban spaces to indoor spatial scenarios. In the autonomous driving sector, Momenta mass-produced and deployed its R7 world model in April 2026, leveraging over 12 billion kilometers of real driving data to enable the system to predict the world. Li Auto released MindVLA-o1, defining autonomous driving as the starting point for physical AI. In indoor and home scenarios, Ezviz released its self-developed "Ezviz Star World Model," where its AI floor cleaning robot constructs 3D semantic maps of homes to predict the movements of pets and people. Daxiao Robotics, in collaboration with CUHK, released Kairos-HomeWorld, the world's first world model capable of full-house generation and full object interaction, simultaneously open-sourcing a dataset of 300,000 real Chinese residential floor plans and 5,000 simulated scene datasets. In the architecture and BIM field, global design software giant Autodesk strategically invested in World Labs, driving physical AI from "understanding data" to "understanding architecture." In June 2026, Fei-Fei Li's team released World Tracing technology, capable of recovering complete 3D geometry from a single building photo. In the outdoor and urban space domain, Amap released the world's first 3D-native urban world model, ABot-Earth0.5, in June 2026, covering over 190 countries and regions. It can generate kilometer-scale 3D city scenes on consumer-grade GPUs in just 10 minutes from satellite imagery, at a cost of just one percent of traditional methods. Baidu integrated world model capabilities into its ERNIE 5.0 large model and Apollo autonomous driving system. Google DeepMind connected 280 billion street view images covering 110 countries to its Genie world model, allowing users to generate interactive environments based on real locations. In the indoor spatial intelligence field, international players include Mappedin, the world's largest indoor mapping platform, which uses AI and LiDAR technology to convert building floor plans into dynamic 3D digital maps, covering over 10 billion square feet of indoor space across 86 countries. NavVis, a German indoor spatial intelligence solutions provider founded in 2013, serves companies like Daimler and Huawei through mobile scanning systems and digital twin platforms. VergeSense released the Large Spatial Model (LSM), predicting human behavior patterns based on over 200 million square feet of office space behavioral data collected over eight years. Vestella Labs, a spatial intelligence company focused on physical AI, has core technology that automatically converts unstructured spatial information (e.g., images, PDFs, CAD drawings) into AI-understandable spatial data. Domestically, Shuwei Tech has established a Chinese indoor spatial information database through a decade of continuous updates, using crowdsourced field collection and automated annotation. It performs continuous, point-by-point multi-modal annotation (visual, text, wireless fingerprint, etc.) of pedestrian-accessible urban spaces, including complex indoor environments, ultimately generating large-scale multi-modal datasets.

Industry data shows that China's embodied intelligence market was approximately RMB 915 billion in 2025 and is expected to exceed RMB 1,090.4 billion in 2026. The global indoor positioning and navigation market was $16.9 billion in 2025 and is projected to reach $72.46 billion by 2032, with a compound annual growth rate of 23.11%. The global BIM market was approximately $9.5 billion in 2025 and is expected to reach $32.5 billion by 2036. Leading industry players have realized that the ultimate barrier for world models lies in data, not algorithms. Since 90% of human life, work, and consumption occurs indoors, indoor spatial intelligence is an unavoidable core capability, whether for embodied robots entering homes, smart appliances understanding home layouts, or enterprise offline business decisions. This is the most valuable and hardest-to-acquire part of the world model data infrastructure.

The competition for world models is fundamentally a competition for data infrastructure, and the core of this data infrastructure is real, granular, and commercializable indoor spatial data. When Jensen Huang declares physical AI as the next wave of growth, when Amap reconstructs 3D cities, Momenta predicts road conditions, and Ezviz enables robots to "understand" homes, every direction calls for real, accurate, and scalable spatial data. AI is learning to "imagine" the physical world, but what keeps this imagination grounded in reality and makes world models truly usable is the real world's every brick and tile, every person and place, every entry and exit.

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com