Sugon's scaleFabric Officially Launched, "Completing the Final Piece of China's High-End Intelligent Computing Puzzle"
2026-03-13 14:59
Favorite

Wedoany.com Report, On March 12, Sugon officially launched its first fully self-developed 400G lossless high-speed network – scaleFabric. This marks a significant breakthrough in China's domestic high-end native RDMA technology, filling the technical gap for ultra-large-scale intelligent computing high-speed interconnection.

In his speech at the launch event, Academician Wu Hequan of the Chinese Academy of Engineering stated that the newly released scaleFabric is China's first self-developed native RDMA high-speed network system. Its performance is on par with mainstream international products and has undergone large-scale practical validation, breaking foreign technological monopolies and addressing the shortcomings in domestic high-speed networks.

Why is high-speed networking so crucial? In Academician Wu Hequan's view, as a core and key technology of computing power infrastructure, the autonomy and controllability of high-speed networks directly impact the security and development quality of national computing power infrastructure.

"If we compare an intelligent computing center to a super-large computing factory, with GPUs being the workers on the production line, then the high-speed network is the conveyor belt connecting them," explained Li Bin, Senior Vice President of Sugon, vividly to reporters. "If the conveyor belt isn't fast or stable enough, even more workers can only sit idle and wait."

For a long time in the past, this very "conveyor belt" was precisely the weak link in China's intelligent computing system construction.

The Dilemma of Domestic Intelligent Computing

Currently, AI large model training has entered the era of ten-thousand-card and even hundred-thousand-card clusters. Taking ultra-large-scale models as an example, their training requires thousands or even tens of thousands of GPUs to work collaboratively for weeks or even months. During this process, the volume of data exchange between chips grows exponentially, and network performance directly determines computing power efficiency.

However, for a long time, China's high-end high-speed network market has been dominated by foreign companies. Broadly speaking, there are two main network paradigms: one is imported InfiniBand (IB) networks, represented by overseas tech companies, which offer leading performance but come with high costs and uncontrollable supply cycles; the other is the RoCE (RDMA over Converged Ethernet) route, which grafts RDMA technology onto Ethernet. This latter solution is compatible with the IP ecosystem and has played a significant role in scenarios like early-stage computing power construction and small-to-medium-scale networking, laying a good foundation for the domestic network industry. However, as intelligent computing steps into the era of ten-thousand-card clusters, the requirements for network scale and performance increase exponentially. RoCE gradually faces challenges in areas like bandwidth utilization and latency control during large-scale networking. Simultaneously, its core NIC chips still primarily rely on foreign vendors, becoming a link that needs further breakthroughs in the industrial chain's autonomous development.

Academician Wu Hequan stated, "For a long time, the high-speed, high-end network market has been monopolized by foreign technology, becoming one of the core bottlenecks for the autonomous development of China's computing power industry."

The "Action Plan for Computing Power Interconnection and Interoperability" previously issued by the Ministry of Industry and Information Technology clearly states the need to accelerate the construction of a computing power interconnection and interoperability system to improve the utilization efficiency of public computing resources. Meanwhile, the "15th Five-Year Plan" also lists new infrastructure as a key national advancement direction, explicitly aiming to build a nationally integrated computing power network, thereby providing solid support for industrial upgrading and digital-intelligent development during the "15th Five-Year Plan" period and beyond.

Against this backdrop, the launch of Sugon's scaleFabric holds special strategic significance.

Full-Stack Self-Development: 100% Autonomous from Underlying Chips to Upper-Level Software

It is understood that after three years of dedicated research, Sugon has achieved 100% full-stack self-development for scaleFabric, spanning from underlying hardware to upper-level software. The scope of self-development covers core key IP, switching chips, NICs, switches, drivers, and management software, among other critical components.

This means that China now possesses a completely autonomous technological system and intellectual property rights in the high-end networking field, no longer subject to external constraints.

In terms of performance metrics, scaleFabric has reached internationally advanced levels:

The scaleFabric400 NIC is based on the PCIe 5.0 interface

—— Port bandwidth reaches 400Gbps, with end-to-end communication latency as low as 0.9 microseconds.

The scaleFabric400 Switch

—— Single-port bandwidth reaches 800Gbps, with total switching capacity up to 64Tbps bidirectional, switching latency around 260 nanoseconds, supporting 800G×40 or 400G×80 port expansion.

—— Can easily support deployment of clusters up to 114,000 cards, while total network cost can be reduced by 30%;

"In test environments, typical AI training tasks on a 30,000-card cluster showed significant improvement in network efficiency. Domestic network products are not only usable but have reached a level of being good and durable," said Li Liu, Vice President of Dawning Information Industry (Beijing) Co., Ltd.

Technology Path: Why Choose Native InfiniBand

In the field of high-speed networking, the choice of technology path is crucial. Currently, the industry mainly follows two paths: one is the native InfiniBand (IB) route, which offers excellent performance but has long been monopolized by foreign companies; the other is the RoCE route, which grafts RDMA technology onto Ethernet. Based on in-depth analysis of the technological essence, Sugon directly chose the former.

According to Wan Wei, Chief Engineer of Sugon's High-Speed Network Interconnection Product Department, IB is a dedicated network born for high-performance computing. Its protocol stack is specifically designed for high-speed communication, and its switches employ VCT switching mode, controlling latency within 300 nanoseconds.

In fact, from industry application practices, the performance of different technology paths in large-scale intelligent computing scenarios is gradually diverging. RoCE inherits Ethernet's "store-and-forward" switching mechanism, where data packets must be fully received before being forwarded. This creates an objective difference in latency control compared to the natively designed IB architecture – industry test data shows that RoCE's processing latency at switching nodes is typically more than double that of IB solutions.

More noteworthy are the underlying design differences in flow control mechanisms. IB employs a credit-based flow control mechanism, confirming sufficient resources at the receiving end before data transmission, fundamentally avoiding packet loss. RoCE relies on the PFC (Priority Flow Control) mechanism for congestion management, which is a passive "detect-and-solve" approach. Industry insiders point out that as cluster scales expand, the PFC mechanism can easily trigger chain reactions, leading to so-called "PFC storms" or deadlock states. This requires operations teams to invest significant effort in tuning congestion control algorithms and configuring waterlines.

"For ten-thousand-card large-scale clusters, these differences directly determine whether the system can operate stably," said Li Bin, Senior Vice President of Sugon. "Therefore, in terms of technology path, we chose to take the most difficult but correct route."

Practical Validation: 30,000-Card Clusters at Core Nodes Running Stably

Technological innovation must ultimately withstand the test of practice.

It is reported that scaleFabric has already been deployed and synchronized online in three ten-thousand-card clusters at the national supercomputing internet core nodes, with a scale of nearly ten thousand cards running continuously and stably for over six months. This is the first time a domestic high-speed network has undergone such large-scale real-load validation.

"In practical applications, network fault recovery time is less than 1 millisecond, and training tasks are completely unaware of any network fluctuations," said Li Liu, Vice President of Dawning Information Industry (Beijing) Co., Ltd. "This provides reliable assurance for large-scale AI model training."

From a cost perspective, while scaleFabric's performance matches top international IB products, its cost is approximately 30% lower than current market IB solutions. It addresses both the high-cost issue of imported IB and the performance shortcomings and operational costs of RoCE networks in large-cluster scenarios.

Building an Ecosystem: From Single-Point Breakthrough to Industrial Collaboration

The launch of a product is just the starting point; building a complete industrial ecosystem is the long-term plan for development.

It is understood that within the "Data Center Network Optimization Project Group" under the "AI Computing Open Architecture Joint Laboratory" of the "Photosynthesis Organization," Sugon is collaborating with upstream and downstream partners in the industrial chain to promote the formulation of autonomous network standards and the development of scenario-specific solutions.

This means that the birth of scaleFabric is not only a breakthrough for a single product but also the starting point for a domestic high-performance network ecosystem. From chip design to equipment manufacturing, from software development to system integration, a complete industrial chain is taking shape.

"In the future, as more and more enterprises adopt domestic networks, the entire industry will form a virtuous cycle," said Li Bin, Senior Vice President of Sugon. "The more application scenarios, the faster the product iteration; the better the product performance, the larger the application scale."

Completing the Final Piece of China's High-End Intelligent Computing Puzzle

"RDMA high-speed networks are the 'major arteries of computing power' for intelligent computing clusters," Academician Wu Hequan defined the importance of high-speed networks. The launch of Sugon's scaleFabric also provides new solutions for the construction of China's domestic intelligent computing system from multiple dimensions.

Currently, scaleFabric achieves 100% full-stack self-development, reaches internationally advanced levels in key metrics such as latency, bandwidth, and networking scale, provides domestic users with a new technological choice, offers domestic network support for high-end computing scenarios like intelligent computing centers and supercomputing centers, and helps build a more complete and autonomous computing power industrial chain.

Notably, the practice of synchronously deploying three ten-thousand-card clusters based on scaleFabric at the national supercomputing internet core nodes indicates that domestic high-speed network products already possess the capability to support large-scale commercial deployment and can meet the practical needs of scenarios like AI large model training and supercomputing tasks. Meanwhile, scaleFabric adopts the native InfiniBand technology path, complementing the existing RoCE route, providing users with more choices and helping to form a more diverse and healthy industrial ecosystem.

Most importantly, at the national strategic level, it responds to the "15th Five-Year Plan's" focus on new infrastructure, implements the relevant deployments concerning "Artificial Intelligence+," and strengthens the foundation for digital economic development.

From chips to networks, from hardware to software, China's intelligent computing is building a complete, autonomous, and sustainable industrial system. "Currently, artificial intelligence is becoming ubiquitous across all domains, computing power has become the core productive force, and the competition in computing power has also escalated into a full-ecosystem game involving the synergy of computing, networking, and storage," said Academician Wu Hequan. "We hope that Sugon will take this as a new starting point, continue technological innovation, deepen industrial collaboration, and build an autonomous, advanced, and secure high-speed network technology and product system."

 

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com