Huawei Launches Full-Stack AI DC Data Infrastructure Solution, CMS Reduces First-Token Latency by 90% to Accelerate Agent Business Inference Deployment

2026-05-22 16:07

Favorite

en.Wedoany.com Reported - On May 21, 2026, at the Huawei Innovative Data Infrastructure Forum held in Paris, Yuan Yuan, Vice President of Huawei and President of the Data Storage Product Line, officially launched the full-stack AI DC Data Infrastructure Solution. The solution encompasses five core components: OceanStor Pacific All-Flash Scale-Out Storage, Context Memory Storage (CMS), AI Data Platform, ModelEngine Nexent Agent Platform, and an end-to-end data protection solution, covering the entire chain from data storage, management, and protection to agent application development.

The industry backdrop for Huawei's launch of this full-stack solution is that AI data centers are facing a structural shift from the training era to the inference era. Yuan Yuan pointed out that in 2026, AI business is accelerating its shift from "training-centric" to "inference-centric," with the focus of token consumption moving from large model pre-training to application deployment. This shift means AI infrastructure needs to start from the entire process of data acquisition, preprocessing, training, inference, and application, building a new generation of data infrastructure capabilities geared towards AI application deployment. The five core components released this time correspond to the systematic capability enhancements in the storage layer, memory layer, knowledge layer, application layer, and security layer of AI data centers. Concurrently, Huawei also announced its "Source-Grid-Load-Storage AIDC" strategy for AI data centers, covering new power supply and cooling infrastructure such as ultra-high-power UPS and full liquid cooling solutions with shared air and liquid sources, further matching the power access and heat dissipation requirements of AI computing clusters.

Among the five core components, Context Memory Storage (CMS) is positioned as a key breakthrough for ultra-large-scale inference clusters. This is the industry's first context memory storage product supporting heterogeneous computing power and is one of the core components in Huawei's "3+1" AI Data Platform architecture. In terms of technical implementation, CMS uses all-memory ultra-fast media, based on memory semantic interconnect technology, supporting two semantic offloading paths: one is KV semantic direct pass, offloading the GPU's KV Cache to CMS for direct access; the other is semantic processing via a dedicated DPU, freeing up GPU inference computing power. CMS can linearly scale into a PB-level shared KV Cache pool. For multimodal inference scenarios with ultra-long contexts, the first-token latency can be reduced by 90%. When multiple inference tasks run simultaneously, the shared KV Cache pool of CMS prevents the same context from being repeatedly loaded by different GPUs, effectively improving memory utilization and inference throughput.

OceanStor Pacific All-Flash Scale-Out Storage plays the role of the storage foundation for massive training data within the full-stack solution. This product achieves optimal total cost of ownership with an industry-leading high capacity density of 11PB/2U, supports real-time multimodal cross-site data ingestion into the data lake, and enables second-level retrieval of hundreds of billions of dimensions of vector data, efficiently storing the massive unstructured data required for AI applications. The AI Data Platform provides three core capabilities: knowledge generation and retrieval, KV Cache storage and optimization, and inference memory management, offering system-level solutions to the long-standing challenges enterprises face with agents, such as difficulty in using knowledge, inability to retain memory, and poor inference experience.

The ModelEngine Nexent Agent Platform plays the role of the "last mile" hub for AI application deployment in the solution. The development and deployment of AI agents have long faced engineering challenges such as multi-tool adaptation, edge-side computing power fragmentation, and fragmented heterogeneous chip ecosystems. ModelEngine Nexent integrates one-stop capabilities including multimodal parsing, retrieval-augmented generation, and agent development and deployment, providing developers with a unified platform from model access to application launch, reducing the engineering complexity for enterprises in the AI application deployment phase. The end-to-end data protection solution addresses risks faced by AI data assets, such as ransomware, hardware failures, and human operational errors, providing full-process protection capabilities including backup, disaster recovery, and anti-ransomware, ensuring business continuity for AI data centers.

These five components create synergistic effects under Huawei's "3+1" AI Data Platform architecture. "3" refers to the specialized capabilities for storing and optimizing three types of data: knowledge, KV Cache, and memory, while "1" refers to intelligent scheduling and collaborative management across components through the Unified Cache Management Engine (UCM). Under this architecture, OceanStor Pacific handles the persistent storage of massive training data, CMS focuses on context caching and semantic offloading during the inference phase, the AI Data Platform is responsible for the unified management of knowledge bases and memory banks, ModelEngine Nexent provides upper-layer tools for agent development and deployment, and the end-to-end data protection solution provides security protection throughout the entire stack. These five components form a closed loop around the full lifecycle of AI data from collection, training, to inference and application.

As large models penetrate agent business scenarios, the proportion of inference workloads in total network computing power consumption continues to rise. The bottleneck of data infrastructure is shifting from "unable to store, slow to train" to "inference memory loss, slow first-token response, and difficulty in multi-agent collaboration." Through CMS's shared KV Cache pool, the AI Data Platform's memory and knowledge management, and ModelEngine Nexent's agent development capabilities, Huawei has provided a systematic solution to the data bottleneck in the inference phase. Huawei also announced at the Paris forum that it will collaborate with global partners to build the AI DC data infrastructure ecosystem, accelerating the intelligent leap of industries.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com

China

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com

Previous：AI System at UC San Diego Passes Turing Test Verification for the First Time

Next：Wolfspeed Launches Two 3.3kV SiC Power Module Families, Targeting Power Bottlenecks in AI Data Centers and Energy Transition