US Tech Giants Accelerate Self-Developed AI Chips, AWS Achieves 4x Performance per Watt

2026-06-30 09:47

Favorite

en.Wedoany.com Reported - With the arrival of the era of "agentic AI" capable of autonomous judgment and action, the focus of the global AI infrastructure market is rapidly shifting from large-scale "training" to the "inference" domain required for actual services. In this process, the importance of data center power efficiency and total cost of ownership (TCO) has surpassed the absolute performance of individual chips. To break Nvidia's market dominance, global hardware manufacturers, tech giants, and South Korean domestic K-AI semiconductor companies are accelerating their efforts.

The surge in computing demand and Nvidia's GPU monopoly have created cost pressures for enterprises, prompting global tech giants to begin developing their own AI semiconductors optimized for their data centers and services. These companies aim to build a full-stack infrastructure encompassing chips, server architecture, networking, and software to maximize "token economics" and "power efficiency" in real-world operating environments. Google Cloud has upgraded its self-developed AI semiconductor TPU to the sixth generation "Trillium," significantly improving computing performance and high-bandwidth memory (HBM) capacity compared to the previous generation. This supports the training and inference of the large model "Gemini" and is supplied to external customers via the Google Cloud Platform (GCP). Microsoft (MS), to optimize the cost-performance ratio of its Azure cloud infrastructure, has launched the custom AI accelerator "Maia" series. This chip, based on a chip design partnership with OpenAI, aims to reduce the operational costs of running Azure OpenAI services (such as ChatGPT). Meta is introducing its self-developed training and inference accelerator "MTIA" (Meta Training and Inference Accelerator). Optimized for ad recommendation algorithms and Feed ranking engines, this chip handles large-scale computing at low power and has been extended to serve inference for its open-source large language model "Llama" series.

Among the tech giants, AWS has adopted a dual-track strategy, expanding its self-developed chip ecosystem while maintaining cooperation with Nvidia. AWS's accelerator business has reached a multi-billion dollar scale and become a core layer of its infrastructure. Over 50% of the tokens in its fully managed generative AI service "Amazon Bedrock" run on its self-developed accelerator chips "Trainium" and "Inferentia" infrastructure. The "Trainium2," equipped with 16 chips and capable of handling models with up to 1 trillion parameters, offers 30-40% better cost-performance compared to similar general-purpose GPU instances. Its related revenue grew 150% quarter-over-quarter and has secured production partners including the training cluster "Project Rainier" built in collaboration with Anthropic, as well as Apple, Uber, and Databricks. The dedicated inference chip "Inferentia" provides up to 2.3x higher throughput and up to 70% lower inference costs compared to existing instances. AWS has launched the "Trainium3," optimized for agentic AI and video generation workloads, offering up to 4x better performance per watt than the previous generation. Preliminary benchmarks show it can save up to 50% in costs compared to general-purpose GPU training. The "EC2 Trn3 UltraServer," combining up to 144 Trainium3 chips, delivers 362 FP8 PFLOPs of computing performance and 20.7TB of HBM3e memory. Paired with the "EC2 UltraCluster 3.0," which uses a non-blocking Petabit-scale network based on Elastic Fabric Adapter (EFA), hundreds of thousands of chips work together like a single accelerator. The newly introduced "Neuron Agentic Development" feature in 2026 enables AI coding agents to automatically port existing models to Trainium and perform numerical consistency validation, removing barriers to hardware migration.

Furthermore, the camp of tech giants is reducing reliance on Nvidia's "CUDA" through open-source software alliances. AWS is promoting the open-source "Neuron SDK," designed based on the open standard XLA and integrated with industry-standard frameworks such as PyTorch, JAX, vLLM, and Hugging Face, allowing developers to use these libraries with minimal code modifications. The global accelerator market is moving from a single monopoly of general-purpose hardware into an era of architectural diversity. Competition among tech giants in self-developed silicon chips and full-stack infrastructure efficiency will intensify due to the surge in agentic AI and high-capacity media generation workloads.

(Source: Pixabay)

AWS Solutions Architect Lee Soo-ji, commenting on AI infrastructure strategy, noted that AWS's investment in self-developed AI silicon chips is not merely about replacing specific hardware, but about providing customers with better cost-performance and broader choices, creating a positive cycle for accelerated computing. Only when multiple architectures coexist in the market can competition drive down prices and improve performance. When evaluating AI infrastructure, the key is a fully integrated full-stack system—from the accelerator chip to the server architecture supporting it, the network connecting large-scale clusters, and the software and managed services that maximize hardware potential—to reduce TCO. In the next-generation AI environment, managing "token economics" and "power efficiency" will determine the survival of enterprise businesses. Agentic AI, with its need for task planning, orchestration, and real-time response, causes computing characteristics to change constantly. As data center power is a finite resource, performance per watt—energy efficiency—will become a core competitive advantage for enterprises.

America

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com

Previous：South Korea's Dunamu Invests 24.3 Billion Won in Information Security in 2025, Up 64.7% Year-on-Year

Next：Korea ICT Grand Alliance Welcomes 'Korea Grand Leap Mega Project' Focusing on Semiconductors and AI