China's Xiaomi Launches HarnessX, Boosting AI Agent Performance by 14.5% on Average

2026-06-25 10:15

Favorite

en.Wedoany.com Reported - Xiaomi researchers have introduced the HarnessX framework, designed to address the engineering bottleneck where enterprise AI agent performance is constrained by the "harness." This framework treats AI harnesses as composable objects and autonomously improves their code, thereby enhancing the performance of AI systems in areas such as software engineering and web interaction.

Currently, most AI application harnesses are static and handcrafted, lacking the ability to automatically improve based on execution data, which is a key factor limiting AI agents from completing complex, long-term tasks. Traditional harness development faces three major challenges: first, they are static and require manual rewriting; second, architectural entanglement means adjustments to one component may disrupt others; and third, harnesses and foundation models are optimized in isolation, with execution trajectories often discarded.

HarnessX addresses these bottlenecks through a "unified harness foundry." Its core innovation is treating the harness as a "first-class object"—an independent, serializable, modular, and replaceable entity—thereby separating model configuration from harness configuration. This approach decomposes agent behavior into components such as context assembly, memory management, tool ecosystem, control flow, and observability, with each behavior inserted as a "processor" into the harness lifecycle hooks.

HarnessX Structure

To automate the optimization of the modular structure, HarnessX introduces AEGIS, a trajectory-driven evolutionary engine. This engine treats harness adaptation as a reinforcement learning problem and, to address pathologies such as reward hacking, catastrophic forgetting, and insufficient exploration, designs a four-stage pipeline comprising a digester, planner, evolution engine, and critic with gate. The digester compresses execution trajectories into structured summaries, the planner analyzes these summaries to explore structural changes, the evolution engine generates code-level edits and tests, and the critic with gate prevents reward hacking and catastrophic forgetting.

AEGIS

HarnessX also enables co-evolution of the harness and the model. Through a cross-harness GRPO (Group Relative Policy Optimization) algorithm, it aggregates execution trajectories generated on different versions of the harness into reinforcement learning signals for the model, allowing the model to internalize advanced strategies such as using new tools.

Harness-Model Co-Evolution

Practical tests were conducted across five benchmarks, covering software engineering, multi-turn customer service conversations, web navigation, open-ended multi-step reasoning, and embodied planning. In the tests, a meta-agent powered by Claude Opus 4.6 was responsible for analyzing logs and writing code, while task agents were respectively Claude Sonnet 4.6, GPT-5.4, and the open-weight model Qwen3.5-9B. Results showed that the dynamically evolving harness improved performance in 14 out of 15 model-benchmark combinations, delivering an average absolute performance improvement of +14.5%. Among these, the weakest open-weight model, Qwen3.5-9B, benefited the most, with a performance leap of +44.0% on the ALFWorld embodied planning benchmark and +18.2% on the SWE-bench Verified software engineering benchmark. When data generated by the evolved harness was used to train the foundation model, it brought an additional average performance improvement of +4.7%.

HarnessX Performance

HarnessX currently relies on powerful closed frontier models (such as Claude Opus) as meta-agents to rewrite harness code, and the meta-agent capability of open-weight models remains to be tested. Additionally, if the underlying model itself cannot execute complex workflows, the framework will be unable to enhance overall capabilities. Nevertheless, the researchers plan to release the code in future updates, and HarnessX offers practitioners a new approach focused on harness engineering optimization rather than mere model scaling.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com

China