NVIDIA Blackwell Agent Density Reaches 20x Hopper
2026-06-15 15:37
Favorite

en.Wedoany.com Reported - Artificial Analysis has launched AgentPerf, the industry's first autonomous AI benchmark, providing developers, enterprises, and infrastructure providers with a standard method for comparing autonomous AI systems. Initial test results show that the NVIDIA Blackwell Ultra NVL72 platform delivers leading performance in autonomous AI workloads, supporting 20 times more agents per megawatt than NVIDIA Hopper systems.

Autonomous AI workloads are fundamentally different from conversational AI. A single chat completion is like a sprint, requiring just one large language model (LLM) call and one response. An agent, however, is more like a relay race—it breaks down a goal into multiple steps and continues iterating until the task is complete.

This pattern results in tens to hundreds of LLM calls chained together, with each call passing a growing context to the next, and performing tool calls such as code compilation and execution, database searches, and web browsing at each handoff. The complexity is multiplicative, not additive.

This distinction is critical for performance measurement. Existing AI inference benchmarks measure single LLM calls—how fast an LLM responds to a single request and how many requests a system can handle simultaneously. They are not designed for autonomous workloads, because chained LLM calls, tool call latency, and growing context place different demands on accelerated computing systems compared to single LLM calls.

For companies building and deploying agents at scale, it is essential to understand how quickly agents respond, how many can be deployed simultaneously, and how much useful work the AI infrastructure can accomplish per dollar invested and per watt of power consumed.

In the initial tests, AgentPerf used DeepSeek V4 Pro—a large mixture-of-experts model representing the current frontier model class driving the most powerful agents—to measure autonomous performance. Under this workload, the NVIDIA GB300 NVL72 achieved the highest performance in the benchmark, supporting 20 times more agents per megawatt than the NVIDIA HGX H200 system.

This performance advantage stems from a full-stack, highly co-optimized design. The GB300 NVL72 connects 72 GPUs into a rack-scale system, enabling large MoE models like DeepSeek V4 Pro to be efficiently distributed and executed at scale. CUDA cores further accelerate by overlapping communication with computation, so the cost of cross-expert coordination is absorbed rather than adding latency. As concurrent agent sessions scale up, NVIDIA TensorRT LLM maintains efficiency by separating input processing from output generation, allowing each phase to be independently optimized. These results are based on a benchmark methodology built from the ground up to reflect how autonomous AI operates in production.

AgentPerf is built on real coding agent trajectories. Agents receive tasks, read files, write and edit code, execute commands, and iterate based on results, with all data sourced from real public code repositories across more than 12 programming languages. Long sequence lengths, tool call patterns, and latency all represent real-world coding workflows. AgentPerf measures how many such autonomous tasks a platform can support simultaneously while meeting established performance thresholds for responsiveness and output token rate. Tool calls are not actually executed but simulated using representative CPU processing times, so result differences reflect only the impact of accelerated computing performance. Results translate directly into infrastructure decisions: how many concurrent autonomous tasks can run per accelerator and per megawatt of power.

Leading inference providers, including Baseten, DeepInfra, and Together AI, are already serving autonomous workloads on frontier models like DeepSeek V4 Pro on NVIDIA Blackwell. Together AI provides real-time inference on NVIDIA Blackwell for Cursor, an AI-powered autonomous coding platform. Cursor's agents debug issues, generate features, and perform refactoring while developers continue working. DeepInfra powers Pam.ai, an AI workforce platform for automotive dealerships that deploys agents entirely on NVIDIA Blackwell to book service appointments, handle phone calls, and conduct outbound sales campaigns. As NVIDIA and the open-source ecosystem continue to optimize inference software, the performance and efficiency of autonomous workloads will keep improving. The NVIDIA Vera Rubin architecture is now in full production, bringing next-generation infrastructure capacity to meet the growing demand for scaled autonomous AI. More details on the AgentPerf methodology and full-stack optimizations can be found in the related technical blog.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com