Moore Thread MTT S5000 Completes "Full-Stack" Validation of FlagOS AI Training, Achieving Stable Training of Trillions of Tokens

2026-03-28 10:24

Keywords:

Moore Thread AI

Favorite

en.Wedoany.com Report, On March 27th, the Beijing Academy of Artificial Intelligence (BAAI) released the validation results of the open-source FlagOS, which completed "full-stack" AI training verification using a unified technology stack. Based on its AI training and inference integrated full-function GPU computing card MTT S5000, Moore Thread fully adapted to the FlagOS full-stack training software suite and successfully completed stable training verification for large-scale language models.

During the validation test, the Moore Thread MTT S5000 completed end-to-end training verification for the Qwen3-0.6B language model on 1T Tokens (1 trillion tokens), achieving uninterrupted stable training for over 6 consecutive days and more than 14,000 steps. The average relative error of the Loss curve was controlled within 0.82%. In standard downstream task evaluations, it improved by 1.65 percentage points compared to the industry benchmark baseline (NVIDIA). This result marks a crucial step forward for domestic GPUs in terms of software stack adaptation and stability within the AI training domain.

FlagOS is a unified AI training technology stack launched by the Beijing Academy of Artificial Intelligence (BAAI), aimed at lowering the adaptation barrier for AI infrastructure and enhancing the usability of domestic computing power platforms. Moore Thread's completion of the "full-stack" validation for FlagOS signifies that the MTT S5000 has passed rigorous testing in core dimensions such as operator compatibility, distributed training, and long-term stability, demonstrating its capability to handle large-scale AI training tasks.

The MTT S5000 is a full-function GPU computing card introduced by Moore Thread for AI training and inference scenarios. It utilizes a self-developed MUSA architecture and supports various precision calculations including FP32, FP16, and BF16. During this validation, the MTT S5000 maintained stable operation for over 6 days during continuous training at the scale of hundreds of billions of tokens, with no interruption records, and the Loss curve converged well. This verifies the reliability of domestic GPUs in long-duration, high-load scenarios.

Moore Thread stated that it will continue to advance deep adaptation with domestic AI frameworks, model libraries, and training platforms to improve the usability and ease-of-use of domestic computing power in AI large model training scenarios. With the continuous improvement of domestic GPUs at the software stack level, the "last mile" of AI computing power autonomy is being rapidly bridged.

China

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com

Previous：CIQ Collaborates with AMD to Optimize Enterprise AI and HPC Infrastructure, Boosting US Data Center Development

Next：China Ministry of Education and National Language Commission Release Two New Language Standards, Filling Gaps in Language Standards for Artificial Intelligence