China's Huawei Cloud Launches Agentic Infrastructure Stack Supporting 100,000-Card Clusters
2026-06-16 14:45
Favorite

en.Wedoany.com Reported - Huawei Cloud has launched the "Agentic Infra" stack, a comprehensive portfolio of computing, storage, and networking products designed to support large-scale AI agent operations on NPU-based cloud platforms. This move is seen as the cloud provider's most direct competition yet with Nvidia in the AI infrastructure space.

Huawei

At the Inspire event in Shanghai, Huawei Cloud unveiled AICS (AI Cluster Service), claiming it can support computing platforms with clusters of up to 100,000 cards. The cluster operates on Huawei's proprietary UnifiedBus (UB) interconnect protocol, delivering a throughput of 5 million tokens per second across 1,000 cards, with a total computing power of 200 EFLOPS (exaflops) and token generation latency below 10 milliseconds.

Huawei also launched a storage solution called AMS (Agentic Memory Storage), which provides memory expansion for NPU chips and reduces inference costs for long-cycle agent tasks through hierarchical key-value (KV) caching.

Other components of the stack include the CCE Volcano Next scheduler, which claims to improve resource utilization by over 30% by merging training and inference workloads instead of isolating them; and AgentSphere, a securely isolated sandbox environment where users can launch hundreds of thousands of agent instances per minute.

The stack was unveiled during a keynote speech by Dr. Peter Zhou, Huawei Board Director and CEO of Huawei Cloud. He stated that agentic AI is driving a fundamental shift in computing paradigms. Huawei's demonstration of the infrastructure stack at Inspire comes as China pushes to build domestic alternatives, with the tech giant doubling down on computing power to capitalize on market opportunities following U.S. chip import restrictions. Although Huawei CEO Ren Zhengfei admitted last summer that its chips lag behind U.S. counterparts by one generation, the company is seeking to rapidly close the gap. Its semiconductor design scaling principle, Tau (τ), focuses on improving designs by reducing chip signal propagation delays rather than further shrinking transistors. Huawei has used this concept to design approximately 381 chips and will combine it with the LogicFolding architecture, which has improved τ performance across multiple levels and is crucial to the development of the Kirin processor series.

In the model and agent domain, Huawei launched the ModelArts Next model platform, adding new Reinforcement Learning as a Service (RLaaS) and a model routing layer that dynamically sends requests to the most suitable task among over 20 partner models, including systems from DeepSeek, Zhipu AI, and MiniMax. Huawei claims the routing engine achieves scheduling accuracy exceeding 95% and reduces inference costs by approximately 20%. The partner list has been formalized as the "AI Model Partner Program." Huawei also launched the AgentArts enterprise agent platform, targeting production-grade, long-cycle agent tasks, offering an open-source version with over 90% codebase shared with the commercial version, as well as the AgentArts Orchard portal for building and deploying agents via a command-line interface.

Huawei introduced a dedicated security layer for the stack, including Hold Your Own Key (HYOK) hardware encryption and confidential computing support across virtual machines, training, and inference, claiming over 1,000 days without major service incidents.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com