China's Moore Threads Open-Sources GPU Operator Generation Code Large Model MusaCoder
2026-06-10 16:32
Favorite

en.Wedoany.com Reported - On June 10, Chinese domestic GPU company Moore Threads announced the release and open-sourcing of MusaCoder. This is a specialized code large model for GPU underlying operator generation, capable of generating CUDA and MUSA native GPU Kernels based on PyTorch reference implementations, focusing on high-performance computing, AI training and inference optimization, and the construction of the domestic GPU software ecosystem.

GPU underlying operators are the critical connection layer between AI frameworks and hardware performance. Large model training, inference, scientific computing, and graphics processing all require extensive matrix calculations, tensor transformations, reductions, memory access optimization, and parallel scheduling operations. If the underlying operators are inefficient, even with increased parameter scale, upper-layer models struggle to fully unleash GPU computing power. Historically, operator development has heavily relied on manual coding and repeated tuning by engineers, demanding high expertise in hardware architecture, parallel programming, memory hierarchy, and compiler toolchains. The release of MusaCoder targets this high-barrier segment, aiming to improve the efficiency of underlying operator generation using a specialized code large model.

The uniqueness of this open-source achievement lies in its complete post-training process being executed on the Kuae Intelligent Computing Cluster built on MTT S5000. Moore Threads states that MusaCoder is the industry's first open-source code large model to complete full-chain training and validation based on a domestic GPU computing power base.

MusaCoder adopts a training framework oriented towards Kernel generation, combining methods such as data synthesis, rejection fine-tuning, and reinforcement learning from execution feedback. This allows the model to validate generated code through compilation, numerical correctness, and actual acceleration effects. GPU operator generation differs from ordinary code completion; whether the code can run is only the first step. More critical is whether it can compile stably on the specified hardware backend, produce correct results without illegal fallbacks, and achieve performance improvements in real execution. By incorporating the MUSA backend into the training and validation process, Moore Threads ensures the model serves not only the general CUDA ecosystem but also directly targets the domestic GPU parallel computing environment.

For the domestic GPU industry, the significance of MusaCoder extends beyond being just an open-source model. For domestic AI computing power to enter more developer and enterprise projects, the software ecosystem adaptation issue must be resolved, including deep learning frameworks, operator libraries, compilers, communication libraries, inference engines, and application model migration. With improved underlying operator generation capabilities, developers can more quickly convert high-level tensor programs into executable and optimizable GPU code, reducing manual migration and performance debugging costs. For model vendors, research institutions, and industry application teams needing to adapt to domestic GPUs, such tools can shorten the cycle from code porting to performance validation.

The industry chain impact will focus on areas such as domestic GPU development tools, AI framework adaptation, model training optimization, and intelligent computing center software services. As large models enter a deeper engineering phase, computing power competition no longer solely depends on single-card peak parameters but also on whether the software stack can support stable model training, inference deployment, and performance tuning. If MusaCoder can undergo continuous iteration and gain developer adoption, it will help Moore Threads' MUSA ecosystem accumulate more operators, examples, and optimization experience, and also enhance the validation value of the Kuae Intelligent Computing Cluster in large model training and code generation tasks. Subsequent milestones include model weight download and usage, developer feedback, expansion of MUSA backend adaptation scope, and whether the model can achieve practical results in more AI frameworks and industry operator scenarios.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com