Nvidia Plans GPUs to Directly Access Storage on Vera Rubin, Potentially Accelerating High Bandwidth Flash

As artificial intelligence models continue their rapid expansion in scale, high-bandwidth memory (HBM) is increasingly seen as struggling to keep pace with future capacity demands. In response, the semiconductor industry is turning its attention toward a new frontier: GPU-driven storage architectures that bring ultra-fast flash memory closer to the processor. At the center of this shift is Nvidia, which is advancing plans to introduce a fundamentally different approach to data movement starting with its next-generation Vera Rubin AI platform.

According to TrendForce, citing Song Ki-hwan, a professor in the Department of System Semiconductor Engineering at Yonsei University, who revealed the plans on May 18 at the 2nd Semiconductor Device Frontier Summit in Seoul, Nvidia is developing an architecture called GPU-Initiated Direct Storage Access — or GIDS — and intends to deploy it beginning with the Vera Rubin platform, targeted for partner availability in the second half of 2026.

The GPU becomes the orchestrator — issuing storage commands directly, bypassing the CPU and DRAM entirely.

GIDS vs. GDS: A Fundamental Shift in Data Flow

To appreciate the significance of GIDS, it helps to understand how Nvidia’s existing GPU Direct Storage (GDS) architecture already improved upon traditional data pipelines. Under the conventional model, data traveling from storage to the GPU passes through the CPU and system DRAM — a path that creates bottlenecks as AI workloads demand ever-larger transfers. GDS addressed this by enabling storage to stream data directly to the GPU, eliminating the DRAM hop.

GIDS goes a step further. Rather than having the CPU issue storage requests on behalf of the GPU, GIDS inverts the control: the GPU itself directly commands the storage device, with the CPU entirely out of the loop. This distinction matters because CPUs are structurally constrained in thread throughput, while modern GPUs can generate tens of thousands of parallel threads — making the GPU a far more capable orchestrator of high-bandwidth storage I/O.

Architecture	Data Path	Who Issues Storage Commands	CPU Involvement
Traditional	Storage → DRAM → GPU	CPU	Full
GDS (GPU Direct Storage)	Storage → GPU (bypasses DRAM)	CPU	Issues requests
GIDS (GPU-Initiated Direct Storage)	Storage → GPU (direct)	GPU	Bypassed entirely

Reports also indicate that Amazon and Nvidia are both advancing similar GPU-initiated storage architectures. Wiwynn, a server manufacturer, showcased Nvidia’s Storage-Next initiative at GTC 2026 in March, describing a system in which a GPU orchestrates input and output directly across a 96-drive NVMe array — an early commercial signal that GIDS is already being built into rack-scale infrastructure.

The Case for High Bandwidth Flash

The emergence of GIDS is expected to significantly accelerate interest in High Bandwidth Flash (HBF), a new memory tier that stacks NAND flash vertically — in a structure similar to HBM — using through-silicon vias (TSVs). HBF is positioned between HBM and traditional SSDs, aiming to combine elements of both: the bandwidth profile of HBM with the vast capacity advantages of NAND.

Why NAND Flash Now?

NAND flash offers roughly 30 times the bit density of DRAM, enabling far greater memory capacity in a similar package footprint.
GPU-to-HBM data transfer currently accounts for approximately half of total AI system power consumption, creating strong motivation to offload workloads to a more efficient tier.
HBF is particularly well-suited for storing AI model parameters during inference, as those weights are essentially read-only — sidestepping NAND’s limited write endurance compared to DRAM.
Professor Song Ki-hwan estimates that combining six HBF units with two HBM units could increase GPU memory capacity more than 16 times — from 192GB to approximately 3,120GB — potentially enabling AI models roughly 16 times larger than current architectures support.

There is a critical caveat, however. NAND flash has finite write endurance, while DRAM supports virtually unlimited rewrite cycles. This means HBF is not a wholesale replacement for HBM but rather a complementary tier best suited to read-heavy workloads. In the context of AI inference, where model weights are loaded once and queried repeatedly, HBF’s read-optimized profile aligns well with the workload characteristics.

Industry Momentum: SK Hynix and SanDisk Lead Standardization

Separately, SK Hynix and SanDisk signed a Memorandum of Understanding in August 2025 to jointly develop the HBF specification. That partnership formalized in February 2026 when the two companies held an HBF Spec Standardization Consortium kick-off event at SanDisk’s headquarters in Milpitas, California, announcing a dedicated workstream under the Open Compute Project (OCP) — the world’s largest open data center technology initiative — to define industry-wide HBF specifications.

SanDisk has said it is targeting first HBF memory samples in the second half of 2026, with the first AI inference devices built around HBF expected to sample in early 2027. The effort draws on both companies’ expertise in HBM packaging and NAND design, with SanDisk contributing its BiCS advanced stacking technology and wafer-bonding process.

Earlier, Nvidia was reported to have partnered with SK Hynix and Kioxia to explore AI SSDs — custom-designed storage solutions intended to partially supplement HBM as a GPU memory expander. GIDS and HBF represent the natural architectural evolution of that same initiative: rather than bolting additional memory onto a CPU-centric pipeline, the data path itself is redesigned around the GPU as the primary orchestrator of both compute and storage.

— ◆ —

The transition to GPU-initiated storage architectures is not expected to happen overnight. NAND flash performance must continue improving to keep pace with the throughput demands of Vera Rubin-class GPUs, and the HBF specification must achieve broad industry adoption before system designers can count on it as a reliable building block. Nevertheless, the convergence of Nvidia’s GIDS architecture, the SK Hynix–SanDisk standardization effort, and growing hyperscaler interest suggests the memory landscape for AI infrastructure is entering a meaningful inflection point — one where ultra-dense flash sits alongside HBM not as a compromise, but as a complementary layer designed from the ground up for the inference era.

Nvidia Plans GPUs to Directly Access Storage on Vera Rubin, Potentially Accelerating High Bandwidth Flash