en.Wedoany.com Reported - NVIDIA has released NVIDIA Cosmos 3, an open world foundation model for physical AI, built on a hybrid Transformer architecture that integrates visual reasoning, world generation, and action prediction into a single system.
Cosmos 3 is the world's first fully open, all-in-one model capable of natively understanding and generating text, images, videos, environmental sounds, and actions, with leading physical accuracy that can reduce the training and evaluation cycle for physical AI from months to days.
The model addresses a fundamental challenge in physical AI: enabling robots, autonomous vehicles, or visual agents to generalize in the real world with limited training data and fragmented simulation stacks. Its hybrid Transformer architecture pairs a reasoning Transformer with an expert generation Transformer, allowing Cosmos 3 to understand object interactions, motion, and spatiotemporal relationships before generating videos and action trajectories. Trained on a multimodal physical AI dataset containing billions of samples of text, images, videos, sounds, and action trajectories, the model provides developers with a powerful pre-trained foundation to build physical AI systems with less data and lower training costs.
Cosmos 3 achieves leading results on physical AI benchmarks. Among open models, it ranks first in world generation accuracy on Artificial Analysis, Physics-IQ, PAI-Bench, and R-Bench; first in action policy on RoboLab and RoboArena; and first in visual understanding on VANTAGE-Bench and TAR leaderboards.
The Cosmos 3 series offers multiple versions: Cosmos 3 Super is designed for post-training robot and autonomous vehicle models requiring the highest physical accuracy and generation quality; Cosmos 3 Nano is optimized for high-quality video and action reasoning in fractions of a second; and Cosmos 3 Edge, coming soon, is tailored for real-time inference at the edge.

NVIDIA has also launched the NVIDIA Cosmos Coalition, a global collaboration of world model builders and AI developers, with founding members including Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI. The coalition aims to advance open world models across industries, enabling members to contribute models, research, and evaluation techniques while leveraging Cosmos 3 technology, training tools, and NVIDIA DGX Cloud infrastructure for large-scale training.
The Cosmos platform powers NVIDIA's physical AI stack, including new datasets for robotics, physics, human motion, autonomous driving, warehouse safety, and spatial reasoning, as well as physical AI agent skills for neural scene reconstruction, defect image generation, and video enhancement. Physical AI developers are building on the platform, involving Agile Robots, Doosan Robotics, LG Electronics, Samsung Electronics, and Skild AI in robotics; Li Auto in autonomous vehicles; and Centific, Fogsphere, Linker Vision, Milestone Systems, and Yuan in visual AI agents.
Cosmos 3 Super and Cosmos 3 Nano are available now, with Cosmos 3 Edge coming soon. Developers can try Cosmos 3 on build.nvidia.com, download the open model from Hugging Face, customize the model and generate synthetic data using Hugging Face Diffusers and GitHub resources, and deploy the model as an NVIDIA NIM microservice. Model builders and software providers can accelerate access, customization, and deployment of Cosmos for critical inference and synthetic data generation workloads through physical AI agent skills on GitHub, leveraging inference services and cloud infrastructure partners including Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra, and Classmethod.
This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com









