Lenovo Launches AION Plan to Reduce Costs and GPU Dependence with CPU Inference

2026-07-02 08:46

Favorite

en.Wedoany.com Reported - Lenovo announced the launch of the AION plan, which aims to improve operational efficiency, reduce operational costs, and decrease dependence on GPUs by performing inference directly on CPUs.

Ricardo Bloj, President of Lenovo Brazil, stated that against the backdrop of growing demand for computing power, insufficient GPU supply, and high costs, the AION plan clarifies the company's positioning for future development. The core idea of the plan is to optimize AI workloads based on the needs of each application, thereby improving operational efficiency and making full use of existing infrastructure. Bloj explained that AION reinforces the company's vision of a flexible hybrid architecture, aiming not only to provide infrastructure but also to help customers build efficient and scalable AI environments. He added that enterprises can leverage existing resources to accelerate AI projects, thereby shortening time-to-production, improving operational efficiency, and increasing the return on infrastructure investments.

The solution allows lighter or distributed inference workloads to be executed directly on CPUs, freeing up GPU resources for more compute-intensive critical applications. In addition to alleviating GPU cost and supply issues, AION also addresses the common challenge of underutilized CPU capacity in data centers. Enterprises can launch AI projects using existing CPU resources without relying entirely on procuring dedicated hardware.

On the technical front, the plan leverages the multi-core capabilities of Intel Xeon 6 to achieve massive parallel execution, handling multiple inference requests simultaneously. This significantly improves the service capacity of each server in enterprise applications, AI APIs, transaction systems, and chatbots. According to Lenovo, preliminary test results for AION show a First Token Time to Think (FTTT) of 0.3 milliseconds for CPU inference, with a response generation speed of 11 tokens per second, all without using a GPU. Bloj stated that the project demonstrates how intelligent combinations of different technologies can expand access to artificial intelligence in a sustainable and efficient manner.

China

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com