China's X Square Robot Open-Sources Three Embodied Intelligence Technologies
2026-06-18 14:44
Favorite

en.Wedoany.com Reported - Chinese robotics company X Square Robot is pushing humanoid robots toward more complex application scenarios, with the core goal of enabling robots to operate autonomously in real, chaotic, and unpredictable human living and working environments.

Wang Qian, founder and CEO of the company, stated that the hardware foundation of the robotics industry is largely in place, with rapid progress in humanoid locomotion, dexterous hands, and force control systems. The real bottleneck lies in intelligence. To bridge this gap, X Square Robot has open-sourced three technologies over the past few weeks: the vision-language-action model Wall-OSS-0.5, the world action model WALL-WM designed to understand physical events, and the robot-free data collection and training framework XRZero-G0.

Wall-OSS-0.5 directly addresses the question of whether pretraining can teach robots useful skills. Unlike approaches that evaluate fine-tuned models, the company deployed the pretrained model directly on physical robots and tested it across 17 real-world tasks. The system demonstrated zero-shot performance in object sorting, ring stacking, and deformable object manipulation. The model employs a "gradient bridging" training framework that converts robot actions into action tokens, learning them alongside language and visual representations during pretraining, enabling perception, language understanding, and action generation to co-evolve within a unified model. The company found that action training not only improved manipulation capabilities but also enhanced visual grounding performance, indicating that physical interaction can strengthen the model's understanding of the world.

WALL-WM aims to address the issue that most VLA systems only learn action trajectories without truly understanding physical causality. This model shifts learning from fixed action sequences to meaningful physical events such as reaching, grasping, lifting, and placing. Unlike traditional architectures, WALL-WM aligns visual observations, language descriptions, and actions around real-world events, with the goal of enabling robots not only to act but also to predict outcomes, reason about physical changes, and adjust when plans fail.

To tackle the data bottleneck in embodied intelligence, X Square Robot introduced the software-hardware framework XRZero-G0. This system integrates wearable interfaces, multi-view sensing, automated quality inspection, and real robot validation for robot-free data collection and training. Through controlled experiments, the company found that combining ten robot-free demonstrations with one real robot demonstration achieved performance comparable to datasets built entirely from real robot data. The company also released over 2,000 hours of multimodal data covering approximately 3,000 tasks to support embodied intelligence research.

These three open-source technologies together form a full-stack framework spanning data, world models, and robot foundation models. Wang Qian believes that the "aha moment" for embodied intelligence may be closer than people think.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com