Google's TurboQuant Sends Memory Stocks Into a Global Selloff

Google’s TurboQuant Sends Memory Stocks Into a Global Selloff

Google’s TurboQuant: The Algorithm That Rattled a $500B Industry

MU▼ 3.4% · WDC▼ 4.7% · SNDK▼ 5.7% · STX▼ 4.0% · 005930.KS▼ 4.8% · 000660.KS▼ 5.9% · GOOGL▲ 0.8% · NDX▲ Advanced ···· MU▼ 3.4% · WDC▼ 4.7% · SNDK▼ 5.7% · STX▼ 4.0% · 005930.KS▼ 4.8% · 000660.KS▼ 5.9% · GOOGL▲ 0.8% · NDX▲ Advanced

AI & Semiconductors

Google’s TurboQuant Sends Memory Stocks Into a Global Selloff

A single research paper — compressing AI memory by a factor of six — wiped billions from chip giants on two continents. But Wall Street is urging investors to buy the dip.

Tech & Markets Desk March 26, 2026 Updated: 14:30 PT

On Tuesday, March 24, Google Research quietly published a blog post about a new compression algorithm. By Wednesday morning, it had knocked billions of dollars off the market capitalisations of memory chip makers across three continents. The algorithm is called TurboQuant, and it addresses one of the most expensive and persistent bottlenecks in artificial intelligence infrastructure: the Key-Value (KV) cache.

The KV cache is the AI system’s working memory — a high-speed data store that holds context from prior tokens so a model does not have to recompute everything from scratch with every new word it generates. As models handle longer documents, conversations, and multi-modal inputs, this cache grows rapidly, consuming GPU memory that could otherwise be used to serve more users or run more powerful models. TurboQuant, according to Google, compresses that cache to just 3 bits per value — down from the standard 16 — reducing its memory footprint by at least six times without any measurable loss in accuracy.

6×

KV cache compression

8×

Speed gain on H100

3-bit

Quantization, no retraining

Accuracy loss (benchmarks)

The Market Reaction: A Two-Day Global Selloff

The immediate market response was swift and, in the view of several analysts, disproportionate. During Wednesday’s U.S. trading session, memory and storage stocks fell sharply — even as the broader Nasdaq 100 advanced. The declines extended into Thursday, rippling across Asian markets.

Closing Declines — March 25–26, 2026

SNDK

SanDisk Corp.

▼ 5.7%

WDC

Western Digital

▼ 4.7%

STX

Seagate Technology

▼ 4.0%

Micron Technology

▼ 3.4%

000660.KS

SK Hynix (Korea)

▼ 5.9%

005930.KS

Samsung Electronics

▼ 4.8%

Kioxia

Kioxia Holdings (Japan)

▼ ~6%

The declines dragged South Korea’s benchmark KOSPI index down by as much as 3% on Thursday, with SK Hynix and Samsung together among its largest weights. Japanese flash memory maker Kioxia Holdings fell by a similar margin in Tokyo. It was a rare moment of synchronised pressure across the global memory supply chain — caused not by an earnings miss or a supply disruption, but by a mathematics paper.

“This is Google’s DeepSeek moment.”

— Matthew Prince, CEO, Cloudflare

What TurboQuant Actually Does

The algorithm is the culmination of a multi-year research effort at Google. It builds on two earlier papers from the same team: QJL (Quantized Johnson-Lindenstrauss Transform), published at AAAI 2025, and PolarQuant, which will appear at AISTATS 2026 in Tangier, Morocco. TurboQuant itself is scheduled for presentation at ICLR 2026 in Rio de Janeiro, Brazil in April. The paper was authored by Amir Zandieh, a research scientist at Google, and Vahab Mirrokni, a vice president and Google Fellow.

How TurboQuant Works: A Two-Stage Pipeline

PolarQuant (Stage 1): Instead of storing data vectors in standard Cartesian coordinates (X, Y, Z), it converts them to polar coordinates — separating each vector into a magnitude and a set of angles. Google’s team found that these angular distributions are highly concentrated and predictable, eliminating the need to store the normalisation constants that traditional quantisation methods require. Most of the bit budget is spent capturing the primary signal.
QJL (Stage 2): The Johnson-Lindenstrauss Transform then compresses the small residual error from Stage 1 down to a single sign bit (+1 or −1) per dimension. This step requires no additional memory to store, completing the compression at a total of 3 bits per value.
The result: No training or fine-tuning required. On NVIDIA H100 GPUs, 4-bit TurboQuant computes attention scores up to 8× faster than the unquantised 32-bit baseline. On long-context benchmarks including Needle in a Haystack, LongBench, ZeroSCROLLS, RULER, and L-Eval, the algorithm achieved perfect or near-perfect scores across open-source models including Llama-3.1-8B and Mistral-7B.

The key innovation is the elimination of “quantisation overhead.” Traditional compression methods reduce the size of data but must store additional constants — normalisation values needed to decompress accurately. These constants typically add one to two extra bits per number, partially undermining the headline compression ratio. TurboQuant avoids this entirely through its two-stage architecture, achieving its 3-bit target with no such overhead.

Beyond Language Models: Vector Search

Google emphasises that TurboQuant has a direct commercial application beyond language model inference. The algorithm improves vector search — the technology that powers semantic similarity lookups across billions of items. Modern search engines, recommendation systems, and advertising targeting increasingly rely on comparing the meanings of billions of high-dimensional vectors rather than just matching keywords.

Tested against existing state-of-the-art methods such as RabbiQ and Product Quantization on the GloVe benchmark dataset, TurboQuant achieved superior recall ratios without requiring the large codebooks or dataset-specific tuning that competing approaches demand. This has direct relevance to Google Search, YouTube recommendations, and Google’s advertising infrastructure — which is to say, it underpins Google’s primary revenue streams.

Wall Street’s Verdict: An Overreaction?

Several prominent analysts quickly pushed back on the severity of the market reaction, arguing that investors are misreading the technology’s scope.

“As context windows get bigger and bigger, the data storage in KV cache explodes higher, causing the need for more memory. TurboQuant is directly attacking the cost curve here. Bullish for the cost curve, again IF this gets adopted broadly.”

Andrew Rocha · TMT Analyst, Wells Fargo

“Current inference models have long adopted 4-bit quantised data. Google’s claimed 8× performance boost is relative to older 32-bit models. These compression technologies are workarounds for compute bottlenecks and will not undermine resilient memory and flash demand over the next three to five years.”

KC Rajkumar · Analyst, Lynx Equity Strategies — $700 PT on MU, Reiterated Buy

“It’s like saying Aramco should crash because Toyota came out with a next-generation hybrid engine.”

Anonymous Analyst · Citrini Research, via X

Morgan Stanley analyst Shawn Kim invoked the Jevons Paradox — the economic principle that efficiency improvements often increase overall resource consumption, not decrease it. He argued that lower cost per AI inference token could unlock vast new tiers of AI deployment that were previously too expensive, ultimately expanding total memory demand rather than contracting it.

“A technology that reduces memory requirements by six times does not reduce spending by six times, because memory is only one component of a data centre.”

— The Next Web analysis

Important Caveats: What TurboQuant Does Not Do

Several critical limitations temper the more alarming market interpretations:

⚠️

Inference only, not training. TurboQuant compresses dynamic KV cache memory during inference. It has no effect on model weights or the substantially larger memory requirements of AI training runs. High-Bandwidth Memory (HBM) used in training by the likes of SK Hynix and Micron is largely unaffected.

🔬

Still in the laboratory. As of publication, TurboQuant has not been deployed at production scale. It is a research result, not a shipping product. Broad commercial adoption remains contingent on integration into inference frameworks and validation by hyperscalers.

📈

Model parameters are growing exponentially. The number of parameters in frontier AI models continues to expand. A 6× compression of KV cache may be partially or wholly offset by the increased memory requirements of next-generation, larger models.

✓

Benchmarks are strong. The technical results — perfect “Needle in a Haystack” recall, 8× H100 speedup, zero accuracy degradation — are independently significant and represent a genuine advance over the existing state of the art in KV cache quantisation.

The Broader Context: A Pattern Investors Recognise

The episode echoes January 2025’s DeepSeek shock, when a Chinese AI lab released a highly efficient open-source model, briefly sending Nvidia and other AI hardware names sharply lower before the market recalibrated. In that case, as in this one, the initial sell-off reflected a genuine insight — that AI efficiency improvements are accelerating — but overstated the near-term demand destruction.

AI infrastructure spending remains at extraordinary levels. Meta alone committed up to $27 billion in a recent deal with Nebius for dedicated compute capacity. Google, Microsoft, and Amazon collectively plan hundreds of billions in data centre capital expenditure through 2026. A compression algorithm that reduces KV cache memory does not reduce the physical footprint of training clusters, networking, or storage for model weights — the components that dominate capex.

Micron’s stock had already underperformed the Philadelphia Semiconductor Index by nearly 20% in the five days before the TurboQuant announcement — its largest short-term relative underperformance since 2011, following strong earnings. Analysts note that high valuations in the sector made it unusually vulnerable to any demand-softening narrative.

What TurboQuant undeniably represents is a signal: the next chapter of AI efficiency will be won as much through mathematical elegance as through brute-force hardware scaling. For memory chipmakers, the implication is not necessarily less demand, but a shift in the composition of that demand — away from raw capacity toward high-bandwidth, high-performance products that can support faster, more intelligent inference at scale.

AI Research Memory Chips Google KV Cache Semiconductors ICLR 2026 Quantisation

Paper address:

https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

Google's TurboQuant Sends Memory Stocks Into a Global Selloff