US-based MinIO Launches MemKV in Redwood City, Petabyte-Scale Microsecond Context Storage Breaks the Heavy Compute Tax on GPU Clusters

2026-05-13 14:20

Favorite

en.Wedoany.com Reported - On May 12, 2026, US-based MinIO launched MemKV, a context memory storage system purpose-built for AI inference, in Redwood City. It advances microsecond-level context retrieval capabilities to petabyte scale for the first time, filling a critical gap between high-bandwidth memory and object storage within GPU clusters. According to a corporate global press release, MemKV is the second pillar in MinIO's product portfolio, following the AIStor object storage, specifically designed for agentic AI inference workloads. By providing a persistent shared context across GPU clusters, it completely eliminates the problem of repeated computation caused by context loss.

In AI inference scenarios, the high-bandwidth memory inside GPUs has very limited capacity. When inference requests involving long contexts and multi-agent collaboration exceed its carrying limits, GPUs are forced to discard computed key-value cache data and recalculate it—a waste known in the industry as the "heavy compute tax," leading to massive unnecessary consumption of compute power, time, and electricity. MinIO co-founder and co-CEO AB Periasamy pointed out that at the scale of thousand-GPU clusters, this waste is no longer just an efficiency issue but a structural drag. MemKV is specifically designed for the inference data path to surgically eliminate this persistent problem.

MemKV's key breakthrough lies in achieving both speed and scale simultaneously. The product incorporates the NVIDIA BlueField-4 STX architecture and integrates natively with the NVIDIA Dynamo and NIXL software stacks. It directly moves key-value cache data between NVMe flash storage and GPU memory via RDMA transport, bypassing HTTP protocols, file systems, or standalone storage servers entirely. Data block sizes are optimized for GPU throughput, set between 2MB and 16MB. Comparative tests on enterprise deployments show that in a typical production cluster equipped with 128 GPUs and a 128K token context length, MemKV boosted GPU utilization from approximately 50% to over 90%, translating the reduction in wasted compute into hard savings of roughly $2 million annually. In terms of latency, internal benchmarks demonstrated a 75x improvement in time-to-first-token, compressing a 53-second baseline to within milliseconds.

For a long time, R&D and investment focus in the AI infrastructure field has leaned towards model training. As large models evolve from answering simple questions to executing complex, multi-step tasks, the industry focus has accelerated its shift towards inference since late 2025. ECI Research's 2025 AI Builders Summit survey shows that two-thirds of enterprise AI leaders have already deployed multi-agent collaboration pilots or formal workflows. Multi-agent collaboration is precisely the scenario where shared KV cache value is highest; various agents interact and share context across GPUs. If every interaction required recalculation, latency and costs would multiply exponentially. MemKV's shared persistent context pool is a direct response to this structural shortcoming.

From the product logic perspective of the storage industry, MemKV does not replace the existing AIStor object storage. Instead, it completes a new cache layer, referred to in the industry as "G3.5," within the STX memory tiering architecture defined by NVIDIA. As an independent product, it forms a high-low mix stack with AIStor below it. MinIO is headquartered in Redwood City, California, USA. Founded in 2014, it established itself with high-performance S3-compatible object storage and has been continuously iterating around AI data infrastructure in recent years.

This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com