Huawei and China Mobile Hubei Complete Commercial Network Verification of AI Inference Acceleration

2026-06-29 10:47

Favorite

en.Wedoany.com Reported - Huawei, in collaboration with China Mobile Hubei, has completed the first commercial network verification of an AI Inference Acceleration Solution in China's telecommunications industry. The achievement was announced at the 2026 Mobile World Congress Shanghai (MWC Shanghai 2026), held from June 24 to 26 at the Shanghai New International Expo Centre (SNIEC), Hall N1.

Panoramic view of Huawei at MWC Shanghai 2026

As AI applications evolve toward agent-based models, scenarios requiring long-context processing, such as code generation and multi-turn dialogues, are becoming increasingly common. However, limited on-chip memory and DRAM capacity reduce the hit rate of KV cache (Key-Value cache), impacting inference performance.

Huawei's solution is built around the OceanStor A800 storage, the Ascend A3 SuperPOD, and the Unified Cache Manager (UCM) launched in 2025. UCM leverages external high-performance storage to achieve PB-level KV caching, overcoming the limitations of on-chip memory and DRAM capacity. The system manages and schedules KV cache hierarchically throughout its lifecycle, extending the context window for single dialogues and reusing historical KV cache in multi-turn dialogues to eliminate redundant computations and reduce inference costs.

The verification was conducted in China Mobile Hubei's commercial network environment, using the vLLM-Ascend framework to test models such as MiniMax M2.5 and GLM-5.1, simulating long-sequence inputs ranging from 8K to 190K tokens. For the GLM-5.1 model, the Time To First Token (TTFT) improved by 51% to 93%, and the Tokens Per Second (TPS) per NPU increased by 56% to 372%. By sequence length, TPS improved by 313% at 64K and 372% at 128K. For the MiniMax M2.5 model, after applying UCM, TTFT improved by 26% to 62%, and TPS increased by 58% at 64K and 78% at 128K. The acceleration effect of the solution becomes more pronounced as context length increases.

A representative from China Mobile Hubei stated that Hubei, located in a core region, has a latency of only 10 milliseconds to the eight national computing hubs. In scenarios such as AI agent interaction and code generation, this solution can increase throughput by over 50%, laying the foundation for large-scale deployment of AI services. Michael Qiu, President of Huawei's Global Data Storage Marketing and Solutions Sales Department, noted that as operators launch token packages, the large-scale deployment of AI agents has entered a new phase, with token consumption expected to grow exponentially.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com