en.Wedoany.com Reported - IBM Research, in collaboration with NVIDIA and Samsung, showcased a prototype of a content-aware storage system on April 14, 2026. This system successfully supports the storage and retrieval of hundreds of billions of vectors on a single server, achieving an average query latency of 694 milliseconds and a recall accuracy of 90%. The system hardware configuration includes IBM Storage Scale ESS 6000 all-flash arrays, six NVIDIA H200 GPUs, and 48 Samsung 30.72TB capacity PCIe Gen5 NVMe solid-state drives.
The goal of this solution is to reduce the infrastructure complexity and cost for enterprises deploying Retrieval-Augmented Generation (RAG) applications. The IBM CAS architecture offloads tasks such as document vectorization and index building, which were traditionally handled by a separate compute layer, directly into the storage system. Vincent Hsu, IBM Fellow and CTO of Storage, pointed out that as AI deployments scale exponentially, enterprises urgently need databases of this magnitude to organize proprietary data for effective AI utilization. Current vector database solutions on the market often require horizontal scaling across dozens or even hundreds of servers to support tens of billions of vectors.
In terms of hardware configuration, Samsung provided 48 enterprise-grade NVMe SSDs based on its latest generation TLC V-NAND flash memory. Each drive offers 30.72TB of storage capacity, with sequential read speeds up to 12,000 MB/s and sequential write speeds up to 6,800 MB/s. The IBM Storage Scale ESS 6000 all-flash arrays decouple compute and storage, and accelerate index reconstruction using NVIDIA H200 GPUs, reducing the index building process from hours on CPUs to minutes on GPUs.
In actual performance validation, the system took a total of 13 days for data loading and indexing, with a total data footprint of 153 TiB. For comparison, completing a task of equivalent scale on a dual-socket Intel CPU setup is estimated to require 120 days. The next goal for IBM and NVIDIA is to complete indexing of over a hundred billion vectors within a single day, further reducing data ingestion time, and continuously optimizing query latency with the help of the NVIDIA cuVS library.
This prototype demonstration offers a new perspective for data infrastructure in enterprise AI applications, suggesting that by moving AI processing capabilities closer to the storage layer, it is possible to improve the utilization efficiency of massive unstructured data while reducing costs and management complexity.
This article is compiled by Wedoany. All AI citations must indicate the source as "Wedoany". If there is any infringement or other issues, please notify us promptly, and we will modify or delete it accordingly. Email: news@wedoany.com









