Google Releases World Model Genie, an Interactive 3D World for Robot Training
2026-05-26 16:51
Favorite

en.Wedoany.com Reported - Google showcased the latest advancements in its "world model" technology at its developer conference. The model, named Project Genie, is capable of generating interactive 3D worlds in real time. Unlike traditional video generation AI, Genie does not output a complete video in one go. Instead, it calculates and responds frame by frame based on user key inputs (such as left, right, forward), functioning similarly to how language models operate. The research team stated that the primary initial application goal for this technology is not gaming, but rather simulation training in the robotics field and the simulation of disaster scenarios.

A major highlight of this update is the integration of Google Street View. Users can now select a real-world location as a starting point, and the model will generate an interactive world from that position. According to Product Manager Diego Rivas, the inspiration for this feature came from users spontaneously challenging the system with prompts like "take me to New York." Currently, this feature only supports locations within the United States, with global expansion already in the planning stages.

On the technical side, the Genie 3 model achieves real-time operation and features long-term memory and high output resolution. However, researchers pointed out that user key commands need to be transmitted over the network to a computing cluster for processing and then returned as rendered frames, a process that places extremely high demands on latency control. Currently, the model still has limitations in handling character movement, environmental noise, and 4K resolution, but the team indicated they have identified directions for subsequent improvements.

In terms of application areas, Genie has already demonstrated broad potential. Google's subsidiary Waymo uses the model to simulate rare traffic scenarios, such as an elephant or a tornado appearing on the road. Furthermore, the model can be used to train robots to perform complex tasks, reducing the number of trial-and-error attempts in real-world operations through simulated environments.

Regarding the long-term application for robots, the research team believes that world models are foundational to embodied intelligence technology. Robots need to be trained in realistic simulated environments to tackle real-world challenges. Currently, the team is still addressing the "control problem," which involves ensuring robots can reliably grasp objects and walk on different terrains.

When discussing industry competition, the team assessed the current state as "compared to large language models, we are in 2021," implying that the market is still in its early stages, with many participants having varying definitions of "world models." Researchers anticipate industry consolidation in the coming years, with a few large players dominating the market. In addition to Genie 3, Google also released its next-generation language models, Gemini 3.5 Flash and Gemini Omni Flash, at this conference, with the latter focusing on video generation and autonomous agent tasks.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com