China's Suanmiao Technology A4E Tape-Out, 3D Stacking Bandwidth Reaches 16TB/s
2026-07-01 16:33
Favorite

en.Wedoany.com Reported - AI large models are iterating at an astonishing pace. The growth rate of storage and bandwidth lags far behind the expansion of these models. This is the long-standing "memory wall" problem plaguing the industry. More critically, current mainstream 2.5D packaging technologies (such as TSMC's CoWoS) are single-plane expansions, with limited routing resources, low integration density, and under the high computing power demands of AI, chip area cannot be further reduced.

When lateral expansion becomes unsustainable, "vertical growth" through 3D stacking technology becomes an inevitable choice. For China's domestic AI chips, given the industrial reality of limited advanced process capacity and restricted supply of high-end HBM, 3D stacking offers a viable path to "trade space for performance" and bypass some process blockades.

01 Packaging Technology: From "Flat Tiling" to "3D Building"

In the field of advanced packaging, 2.5D packaging achieves high-speed interconnection and short-distance communication between chips by integrating multiple bare dies on a silicon interposer. The silicon interposer typically uses Through-Silicon Via (TSV) technology for vertical interconnection, offering high-density, high-performance interconnect characteristics that can significantly improve overall system performance.

3D stacking technology, through chip stacking or package stacking, such as using TSV or hybrid bonding technology, increases functionality, improves integration density, reduces packaging costs, and helps enhance operating speed by shortening interconnect lengths. Through 3D stacking, functional units originally tiled on different chips in 2.5D packaging, such as compute logic, memory arrays, and I/O interfaces, can be physically stacked and electrically interconnected in the vertical dimension, thereby breaking through the physical limits of planar integration.

3D packaging and 3.5D packaging utilize 3D stacking technology. 3D packaging technology vertically stacks multiple bare dies and uses advanced interconnect technologies like TSV and micro-bumps for inter-layer communication, breaking through the physical limitations of traditional planar integration. This architecture significantly shortens electronic transmission paths, drastically reducing transmission latency and power consumption while achieving extremely high interconnect bandwidth and packaging density. 3.5D packaging builds upon 3D vertical stacking by introducing a 2.5D silicon interposer for lateral expansion, forming a "3D + planar" composite architecture.

Currently, mainstream domestic AI chips in China, such as those from Cambricon, Kunlunxin, Biren Technology, and Enflame Technology, primarily use 2.5D packaging technology to interconnect GPU/AI compute chiplets with HBM memory side-by-side, utilizing silicon interposers and RDL (Redistribution Layer) to build high-density interconnect networks. However, the bandwidth of this external memory solution is generally only 1–4 TB/s, and limited by the planar area, integration density and interconnect bandwidth are approaching physical limits.

02 International Giants: 3D Stacking and 3.5D Entering Mass Production

International semiconductor giants have long been deploying 3D/3.5D technologies, with some products already in mass production and delivery.

In 2023, AMD released the Instinct MI300 series AI accelerators, a chip product utilizing 3.5D packaging technology that has entered mass production. AMD describes its technology as 3D stacking of GPU and I/O chips fused via hybrid bonding, combined with standard 2.5D packaging. AMD's 3.5D packaging solution integrates TSMC's CoWoS (2.5D silicon interposer) and SoIC (3D hybrid bonding) technologies, vertically stacking GPU/CPU chips on top of I/O chips via Cu-Cu hybrid bonding, and then interconnecting them side-by-side with HBM3 memory on the CoWoS silicon interposer.

In December 2024, Broadcom publicly announced the industry's first 3.5D XDSiP (eXtreme Dimension System in Package) packaging platform. It combines 2.5D technology with 3D-IC integration using Face-to-Face (F2F) technology. The core of this platform is the Face-to-Face (F2F) stacking technology, which uses bumpless hybrid copper bonding (HCB) to directly connect the top metal layers of the upper and lower chips, enabling direct connection of the top metal layers of the two stacked chips. Compared to traditional Face-to-Back (F2B) technology, F2F eliminates the need for TSVs, can increase the number of signal connections by 7 times, reduce power consumption at the chip-to-chip interface by 90%, and decrease latency between compute, memory, and I/O components within the 3D stack. In 2026, the industry's first 2nm custom compute SoC based on XDSiP was delivered to Fujitsu for AI supercomputing clusters.

Intel's EMIB 3.5D packaging technology combines EMIB 2.5D (embedded silicon bridge lateral interconnect) with Foveros Direct 3D (hybrid bonding vertical stacking), supporting flexible heterogeneous integration of multiple chips and compatibility with the UCIe industry standard. Intel's Data Center GPU Max Series SoC is the most complex mass-produced heterogeneous chip ever built using EMIB 3.5D, containing over 100 billion transistors, 47 active tiles, and 5 process nodes.

Qualcomm's recent HBC technology adopts an innovative dedicated near-memory computing architecture, integrating compute with ultra-high-speed bandwidth memory through a 3D stacked silicon solution to address the data movement bottleneck in AI computing. The AI250, equipped with the first-generation HBC technology, achieves an industry-leading bandwidth rate of 133 TB/s per card, providing an 18x improvement in effective memory bandwidth compared to the AI200 using LPDDR5X. The AI300, equipped with the second-generation HBC technology, achieves a further step-function performance leap, with effective memory bandwidth 54 times higher than the AI200.

03 Chinese AI Chip Manufacturers Collectively Choose 3D Stacking

Facing the leading deployment of 3D stacking and 3.5D packaging by international giants, as well as constraints on domestic advanced process capacity and high-end HBM supply, Chinese AI chip manufacturers are actively exploring the vertical integration of memory and compute units through 3D stacking technology.

Unisplendour Group's Zixuan architecture, centered around 3D DRAM, pioneers a 3.5D heterogeneous integration solution, achieving a memory bandwidth of up to 30 TB/s. In its PNM near-memory computing mode, memory access latency is reduced by up to 1/18. Simulations show that under equivalent computing power, its Token throughput rate is 1.5-2 times higher than NVIDIA's B200 series, and it can be mass-produced at scale based on China's domestic supply chain.

Tsing Micro's next-generation AI chip adopts 3.5D heterogeneous stacking, achieving three-dimensional vertical stacking of reconfigurable compute chiplets and DRAM memory chiplets. Through the vertical integration of "compute chiplets + memory chiplets," it trades architectural innovation for performance leaps under the constraints of advanced process technology. Its second-generation 3D reconfigurable chip breakthrough adopts 3D compute-in-memory + quad-chiplet integration technology, upgrading the traditional 2D planar single-lane transmission mode of chips into a 3D architecture of "4 compute lanes + 4-layer memory overpass," significantly improving data throughput efficiency and compute density, forming significant advantages in performance, energy efficiency, and flexibility.

Suanmiao Technology's 3D TokenPU chip A4E, designed for large model inference, officially taped out on June 15, realizing a large model-specific processor based on China's domestic supply chain using a 3D hybrid stacking architecture. The first-generation product, A4E, vertically stacks 8 layers of memory wafers on top of a compute logic wafer, achieving micron-level interconnection through Through-Silicon Via (TSV) and bump technology. This compresses the traditional "millimeter-level" transmission distance between chips by two orders of magnitude, providing an ultra-large memory access bandwidth of 16 TB/s, effectively alleviating the data starvation problem.

Intellifusion announced that its inference chip under development introduces a 3D stacked memory architecture: adopting a 3D stacked memory architecture to achieve higher bandwidth and lower access latency, breaking through the "memory wall" and improving inference efficiency.

Lingchuan Technology, formerly the Heterogeneous Computing and Chip Division of Kuaishou Technology, completed the tape-out of its next-generation chip in April this year. It utilizes China's domestic 3D stacking technology, pioneering a 3D near-memory architecture with specialized optimization designs for key industry pain points such as heat dissipation, consistency, and reliability. Its first chip, the SL200, has sold nearly 100,000 units cumulatively, deployed in internet companies like Kuaishou, Alibaba Cloud, Baidu Cloud, and Bilibili, covering 99.7% of Kuaishou's live streaming transcoding business and stably serving 700 million users.

04 3D Stacking Needs to Bridge the Gap from Lab to Mass Production

Despite the promising prospects of 3D stacking, its engineering difficulty far exceeds that of traditional packaging.

First is thermal management and heat dissipation. In traditional 2D planar architectures, heat generated by the die can be directly conducted to the top vapor chamber and heatsink. However, in a 3D architecture, heat must overcome multiple obstacles, vertically penetrating multiple layers of silicon, TSV arrays, polymer underfill, and micro-bump interfaces. For 2.5D integrated structures, traditional air cooling systems can still operate at total power levels around 300 watts. However, when the system transitions to true 3D vertical stacking, once the total package power exceeds 350 watts, air-based heat dissipation becomes completely ineffective, necessitating the mandatory introduction of liquid cooling systems and high-performance thermal interface materials.

Second is the hybrid bonding process and yield. Bumpless hybrid copper bonding (HCB) requires interconnect pitches of <10μm or even 1μm, imposing extremely high requirements on surface planarity (CMP), bonding accuracy, and thermal expansion matching. Material differences between the silicon bridge and substrate can lead to thermal expansion mismatch, causing mechanical stress and cracks. The 3D stacking process is complex, and yield improvement relies on continuous refinement of bonding accuracy.

Third is EDA tools and design collaboration. The data volume for 3D design is exploding, requiring deep collaboration between IC designers and packaging engineers. Existing EDA tools struggle to simultaneously handle multi-dimensional optimization of thermal, signal, and power integrity, creating an urgent need for thermal-electrical-mechanical co-design platforms. Currently, the top three international EDA companies offer some tool support for 3D stacked chip design. In contrast, China's domestic EDA companies have relatively few full-flow design tools specifically for 3D stacked chip design. Some companies can provide partial point tools for the simulation phase of 3D stacked chips, but there are still significant gaps in China for tools related to placement and routing, multi-die verification, and Multi-Die DFT testing.

Fourth is testing and reliability. The complexity and high density of 3D stacked chip packaging technology make testing and reliability a significant challenge. New testing methods and equipment need to be developed to ensure the quality and reliability of the package. Additionally, long-term reliability assessments of the package are required to ensure its stable operation under various environmental conditions.

Finally is assembly complexity and the supply chain. Physical assembly involves precise alignment of dies with different thicknesses and coefficients of thermal expansion, requiring intensive thermo-mechanical certification work. The amount of design analysis data far exceeds that of standard packaging. This also leads to relatively high manufacturing costs for 3D stacked chip packaging technology, necessitating continuous optimization of manufacturing processes and cost reduction to enable wider application of 3D stacked chip packaging technology in practical products.

In the post-Moore era, the marginal benefits of transistor miniaturization are diminishing, making advanced packaging key to "More than Moore." For Chinese AI chips, given the industrial reality of restricted imports of advanced process nodes and high-end HBM, simply chasing the international giants' 2.5D+HBM route is no longer sufficient to create differentiated competitiveness. From Unisplendour's Zixuan architecture to Tsing Micro's 3.5D heterogeneous stacking, Chinese manufacturers are proving: when planar expansion hits physical limits, growing upwards and redefining chip integration methods in three dimensions may be the key to breaking the "memory wall" and "area wall," achieving a leapfrog in the global AI computing power race.

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com