OpenAI and Broadcom Release LLM Inference Chip Jalapeño

2026-06-25 08:55

Favorite

en.Wedoany.com Reported - On June 24, OpenAI and Broadcom jointly released their first intelligent processor, Jalapeño. This chip is specifically designed for large language model (LLM) inference, serving as the first AI accelerator in a multi-generational computing platform co-developed by the two companies. The goal is to improve the speed, reliability, and accessibility of AI services, while delivering advanced AI capabilities more efficiently to larger-scale deployment scenarios. OpenAI stated that Jalapeño is its first Intelligence Processor, with an architecture designed for future LLM inference needs.

Jalapeño is not a repurposed general-purpose AI accelerator for inference; instead, it is redesigned around the models, kernels, service systems, and product requirements that OpenAI runs daily. In its announcement, OpenAI noted that the chip is tailored to the operational characteristics of ChatGPT, Codex, the API, and future agent products, with a focus on optimizing computation, memory access, network connectivity, and scheduling efficiency in large model inference. For LLM services, inference directly impacts user wait times, system response stability, and unit computing costs. A chip architecture that reduces data movement and improves hardware utilization can unlock greater efficiency in large-scale services.

OpenAI stated that engineering samples of Jalapeño have been running machine learning workloads, including GPT-5.3-Codex-Spark, at target frequencies and power levels in the lab. The company has not yet released final performance test results, but early tests indicate that Jalapeño's performance per watt will significantly surpass current state-of-the-art levels. OpenAI also said it will release a more detailed technical report later, further explaining the chip's performance in inference, energy efficiency, and system deployment.

The development cycle for this chip was compressed to nine months. OpenAI stated that Jalapeño, from initial design to manufacturing tape-out, was completed jointly by both parties, with OpenAI models used to accelerate parts of the design and optimization process. Chip design typically involves multiple stages, including architecture definition, verification, physical implementation, software adaptation, and manufacturing preparation, with long cycles and high risks. This project introduced model capabilities into the chip design process, reflecting how AI tools are entering semiconductor R&D itself. Broadcom is responsible for silicon implementation and network technology support, with its Tomahawk networking chips and other technologies contributing to the platform's large-scale production.

OpenAI is also embedding Jalapeño into its longer-term, full-stack infrastructure strategy. The company was previously seen primarily as a developer of models and AI products. This launch of its own intelligent processor signals that its capabilities are extending into chip architecture, memory systems, networking, scheduling, and deployment systems. OpenAI President and Co-founder Greg Brockman stated that Jalapeño is part of the company's long-term, full-stack infrastructure strategy, aimed at making computing resources more abundant and AI faster, more reliable, and more affordable.

According to the companies' plans, Jalapeño will be the first step in a multi-generational computing platform, with initial deployment targeted to begin by the end of 2026 and continue expanding in the following years. The platform will combine accelerators designed by OpenAI, Broadcom's silicon implementation, networking, and connectivity technologies, and Celestica's capabilities in boards, racks, and system integration. Broadcom President and CEO Hock Tan stated that the collaboration will support the deployment of gigawatt-scale data centers alongside Microsoft and other partners.

For OpenAI, the significance of an inference chip lies in bringing the cost and response speed of large model services further under its own control. Training determines the upper limit of model capabilities, while inference determines whether a model can reach users stably and at low cost. As call volumes for ChatGPT, Codex, the API, and agent products increase, inference infrastructure must simultaneously address throughput, latency, energy consumption, and reliability. If Jalapeño meets expectations in subsequent deployment, it will provide a new hardware foundation for OpenAI to reduce AI service costs and improve model access stability.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com