China's Zhonghao Xinying Releases TPU Chip "Xuyu" with 896 TFLOPS Computing Power
2026-07-01 14:01
Favorite

On June 30, China's Zhonghao Xinying released its next-generation fully self-developed high-performance TPU AI dedicated computing chip "Xuyu," and simultaneously launched the integrated software-hardware intelligent computing base "Taize 2.0." The "Xuyu" single-chip mixed-precision floating-point computing power reaches 896 TFLOPS, with 8-bit inference computing power reaching 1792 TOPS, and a single-card rated power consumption of 600W.

The technical positioning of "Xuyu" focuses on large model training, inference acceleration, and high-throughput AI computing. TPU is a dedicated acceleration chip designed for tensor computation and matrix operations, with the core task of improving the computational efficiency of deep learning models in training, inference, and batch task processing. Unlike GPUs, which emphasize general-purpose computing coverage, TPUs focus more on matrix multiplication, tensor operations, operator scheduling, and data transfer efficiency in AI models. The "Xuyu" chip released by Zhonghao Xinying elevates mixed-precision floating-point computing power to 896 TFLOPS and 8-bit inference computing power to 1792 TOPS, indicating upgrades to computing units and data pathways for large language models, multimodal models, and high-concurrency inference scenarios.

This chip is Zhonghao Xinying's second-generation TPU product. The computing power of "Xuyu" is three times that of the previous generation "Chana," with a key focus on improving computational throughput in model training and inference.

Large model operation does not rely solely on peak computing power; it is also affected by memory capacity, on-chip cache, chip interconnect, communication bandwidth, operator libraries, and software frameworks. Long-context inference, multi-turn dialogues, agent tasks, and batch generation generate large amounts of KV cache, parameter calls, and intermediate data transfers. If storage and interconnect capabilities are insufficient, computing units can be slowed down by data movement. Zhonghao Xinying's simultaneous launch of "Taize 2.0" alongside "Xuyu" indicates that its technical approach is not to deliver chips alone, but to form a complete intelligent computing platform combining chips, accelerator cards, servers, system software, operator libraries, cluster scheduling, and model adaptation. Such platform capabilities directly impact whether AI models can run stably in large-scale computing clusters.

"Taize 2.0" is designed for AI computing cluster deployment, undertaking software-hardware synergy. The chip handles underlying computation, while the platform manages model loading, task scheduling, resource management, and operational maintenance.

Model ecosystem adaptation is another key point of this release. Public information shows that "Taize 2.0" is compatible with tools and distributed training and inference frameworks such as PyTorch, vLLM, SGLang, DeepSpeed, and Megatron-LM, and adapts to large language models and multimodal models including Qwen, DeepSeek, GLM, and MiniMAX. For AI chip companies, hardware parameters are only the first layer of capability; whether developers can quickly migrate models, whether operators run stably, whether inference frameworks are efficiently invoked, and whether clusters can be continuously expanded determine the speed at which chips enter real projects. Zhonghao Xinying emphasizes that the chip IP cores, proprietary instruction sets, underlying operator acceleration libraries, and complete system software are all self-developed, with the core goal of reducing adaptation costs in model migration and computing power deployment.

Industrial AI, scientific computing, government-enterprise intelligent computing centers, and industry large model platforms are shifting their requirements for computing systems from "being able to run models" to "long-term stable operation." Tasks such as equipment status recognition, industrial visual inspection, knowledge base Q&A, process parameter optimization, R&D assistance, and predictive maintenance require high-throughput inference as well as stable response, energy consumption control, and maintainable software environments.

With the release of "Xuyu," Zhonghao Xinying's TPU roadmap has entered a higher computing power stage. The subsequent technical value will mainly depend on chip mass production capabilities, cluster interconnect efficiency, software stack maturity, model adaptation scope, and real-world operational performance.

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com