China's AReaL 2.0 Open-Sourced: Reinforcement Learning Infrastructure Empowers Continuous Agent Learning
2026-07-03 10:33
Favorite

en.Wedoany.com Reported - On July 2, the open-source reinforcement learning infrastructure project AReaL released version 2.0, aiming to bridge the gap between foundational model training and modern agent applications, providing efficient reinforcement learning training support for agent scenarios.

AReaL 2.0 targets agents already deployed in real-world business scenarios, offering a system infrastructure that enables continuous learning during use. This version allows the interaction processes generated when agents complete real tasks to be recorded, organized, and integrated into subsequent training workflows, continuously optimizing the underlying models. This enables agents to become increasingly capable under safe and controllable conditions.

Currently, agents are entering real production environments to perform complex tasks such as writing code, searching for information, and calling tools. However, although agents work daily, they struggle to truly grow from their work. In real business operations, agents generate a wealth of valuable experience, including task completion status, reasons for tool call failures, user satisfaction, and decision-making directions. Most of this information is only saved in logs, making it difficult to stably and safely convert into improvements for the next iteration.

AReaL 2.0 aims to solve the problem of how agents can continue to grow after deployment. Developers do not need to redevelop the agent; they only need to route the requests originally sent by the agent to the large model through AReaL 2.0's unified inference entry point to access the online reinforcement learning workflow.

Taking Hermes Agent as an example, Hermes normally receives tasks, plans steps, and calls models. AReaL 2.0 records the key interaction processes during task completion in the background and, combined with feedback or reward signals after task completion, uses these real trajectories for subsequent training. Developers can replace Hermes with their own agents and task environments to build an online reinforcement learning workflow for agents in the same way. This means that agent capability improvement no longer relies solely on manually constructed data, offline training, and redeployment; multi-turn dialogues, tool calls, execution results, and feedback signals from real tasks can all become materials for the model's continued learning.

This is particularly important in enterprise scenarios. Agents in enterprise workflows face real, complex, and constantly changing tasks, including codebase updates, business process adjustments, changes in user requirements, and modifications to tools and systems. If an agent's capabilities are largely fixed after deployment, it will struggle to adapt to the real environment over the long term. AReaL 2.0 aims to fill the missing link between "being able to use tools" and "being able to learn from using them."

At the same time, continuous learning in real business cannot simply involve "collecting data and retraining." Agents may access code, customer information, enterprise knowledge bases, and internal systems, so the training pipeline must consider requirements such as permission control, data anonymization, isolation, and auditing. AReaL 2.0 introduces a data proxy mechanism for agent trajectories in its system design, allowing real task data to be managed and used in a safer and more controllable manner when entering the training workflow.

The AReaL team pointed out in its technical report that the key bottleneck for self-evolving agents lies not only in the model itself or reinforcement learning algorithms but also in the lack of an online reinforcement learning infrastructure capable of serving real agents. AReaL 2.0 has undergone architectural upgrades for next-generation agent applications, connecting agent services, real task trajectories, data governance, and online reinforcement learning training, providing a practical engineering foundation for agents to continue learning after deployment.

The AReaL project was initiated in 2024 by teams including Ant Group, Tsinghua University, and the Hong Kong University of Science and Technology. In May 2026, AReaL was incubated from Ant InclusionAI into an independent open-source community and joined the PyTorch Foundation Ecosystem project, integrating into the mainstream reinforcement learning infrastructure ecosystem. As the community develops independently, AReaL continues to receive participation and support from industry and open-source ecosystem partners, including the Huawei Cloud team and MindLab. In the future, AReaL will iterate around directions such as online reinforcement learning, automated evaluation, and multimodal agent training, working with the community to advance the development of the self-evolving agent ecosystem. Currently, the AReaL 2.0 technical report and code have been open-sourced.

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com