NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%
NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%
- 60% of MD5 Password Hashes Can Be Cracked in Under an Hour with a Single GPU
- Dirty Frag: Root Access on Every Major Linux Distribution — No Patch, No Warning
- Ubuntu 26.04 LTS (Resolute Raccoon): The Most Ambitious Ubuntu LTS in a Decade
- Proton Mail: Data Transferred to FBI Again!
- How Close Are Quantum Computers to Breaking RSA-2048?
- How to Prevent Ransomware Infection Risks?
- What is the best alternative to Microsoft Office?
NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%
Tom’s Hardware has published the first independent multi-GPU test of RTX NTC, confirming dramatic memory savings across the RTX 50 and 40 series — though no shipping game supports the technology yet.
(vs. BCn baseline)
(BCn vs. NTC)
at 4K / TAA
On April 11, 2026, Tom’s Hardware published a dedicated benchmark of NVIDIA’s RTX Neural Texture Compression — the first rigorous, multi-GPU independent test of a technology that has been demoed publicly since GTC 2026 but has yet to appear in a single shipping title. The results are striking: in controlled scenes, texture memory use fell by as much as 85 percent while image quality actually improved over the traditional BCn block compression that games have relied on for more than a decade.
The timing of the test matters. NVIDIA first demonstrated NTC at its GTC 2026 conference earlier this year, showing a Tuscan villa render scene dropping from 6.5 GB of VRAM under BCn compression to just 970 MB under NTC. The company also released the RTXNTC SDK on GitHub in early 2026. Tom’s Hardware’s independent benchmark now quantifies that promise across real hardware, from the flagship RTX 5090 down to the RTX 4060 laptop GPU — the kind of 8 GB card that has become a flashpoint in the ongoing VRAM capacity debate.
How NTC Works
Neural Texture Compression is a machine learning-based method that replaces the fixed 4×4 pixel block structure of BCn compression with a small neural network decoder. During the compression phase, a material’s original texture data — up to 16 channels simultaneously — is transformed into a compact set of neural network weights and latent feature vectors. At render time, a tiny Multi-Layer Perceptron (MLP) reconstructs the required pixel data on demand, running on the GPU’s dedicated AI acceleration hardware.
NVIDIA is explicit that NTC is a deterministic decoding technology. Unlike generative AI, it does not invent or hallucinate texture detail; it faithfully reconstructs what was encoded. The process is accelerated through Cooperative Vector extensions for both Direct3D 12 and Vulkan, meaning it is not limited to NVIDIA hardware.
Inference on Sample mode is more suitable for high-performance graphics cards, while Inference on Load can cover all platform hardware.
— Alexey Panteleev, Distinguished DevTech Engineer, NVIDIA
Three Operating Modes
NTC exposes three distinct modes under DirectX 12, allowing developers — or players — to balance memory savings against performance overhead. Vulkan supports only the first two.
NTC textures are decompressed entirely within the GPU during the game or map loading phase and simultaneously transcoded to standard BCn format. Rendering performance is identical to native BCn textures — zero runtime overhead. VRAM usage during gameplay is not reduced, but disk footprint and PCIe bus transfer pressure both fall significantly.
Zero runtime overheadThe highest-compression mode. A pre-trained MLP decodes required pixel data in real time during texture sampling, keeping textures in their compressed neural form in VRAM at all times. Tom’s Hardware measured a reduction from 6,830 MB to just 303 MB in the Intel Sponza scene — an 85% drop vs. Inference on Load, and a greater than 95% reduction vs. the uncompressed lossless reference. Image quality surpassed the BCn baseline. Requires Stochastic Texture Filtering (STF) and works best paired with DLSS.
Up to 85% VRAM savingsUses DirectX 12’s Sampler Feedback feature to identify exactly which texture tiles are needed for the current rendered view, decompressing only those tiles into a sparse BCn structure. Memory savings are less extreme than Mode 02, but performance overhead is lower. Acts as an intelligent middle ground between the two above.
Balanced compromiseBenchmark Results Across GPUs
Tom’s Hardware tested NTC’s Inference on Sample mode — the most demanding but most impactful configuration — using frame-time overhead (milliseconds added per rendered frame) as the primary metric. The Intel Sponza scene was rendered with Temporal Anti-aliasing (TAA) enabled.
| GPU | Resolution | Overhead (ms) | VRAM |
|---|---|---|---|
| RTX 5090 | 4K | +0.09 ms | 32 GB |
| RTX 5070 | 1440p | +0.50 – 0.70 ms | 12 GB |
| RTX 5060 | 1080p | +0.60 – 0.70 ms | 8 GB |
| RTX 4060 (Laptop) | 1080p | +0.70 – 0.85 ms | 8 GB |
The benchmark team noted that the test scenes included only basic forward rendering and anti-aliasing pipelines. In a real AAA game, many rendering stages are unaffected by NTC, meaning the relative overhead in actual gameplay would be proportionally smaller than these figures suggest.
Cross-Vendor Compatibility
A critical aspect of NTC that separates it from proprietary techniques like DLSS is its hardware-agnostic design. The Cooperative Vector extensions used by NTC’s shader-side decoder are supported by NVIDIA Tensor Cores, AMD AI Accelerators, and Intel XMX engines alike. This means game developers who integrate NTC can ship a single code path that runs on all three GPU vendors’ hardware, lowering the integration barrier considerably.
According to prominent leaker Kepler_L2, Sony may incorporate a similar or directly equivalent technology into the PlayStation 6 console, potentially using NTC to reduce game install sizes while keeping storage costs down with a 1 TB SSD. Neither Sony nor NVIDIA has officially confirmed this.
Where Things Actually Stand: No Games Yet
Despite the compelling benchmark results and the SDK’s availability on GitHub since early 2026, no commercially released game currently supports NTC. The Tom’s Hardware test was conducted against NVIDIA’s own sample application using the Intel Sponza reference scene — a developer showcase, not a retail title.
NVIDIA first showcased the Tuscan villa demo at GTC 2026, where the 6.5 GB to 970 MB reduction was presented publicly. Developer interest appears genuine; industry-wide SDK adoption is described as underway. Analysts expect major game engines such as Unreal and Unity to offer formal NTC support in the second half of 2026, which would be the prerequisite for NTC shipping inside actual titles.
Alexey Panteleev, the NVIDIA engineer who developed NTC, confirmed in the Tom’s Hardware interview that game developers can implement NTC on a per-texture basis or expose a player-facing toggle, giving users the ability to opt in based on their hardware capabilities.
What It Means for 8 GB Cards
The practical promise of NTC for existing mid-range hardware is real, but contingent. An RTX 4060 or RTX 5060 running Inference on Sample mode could, in principle, play a high-fidelity AAA game that currently exceeds its VRAM budget — provided the game has been built with NTC integration. The 0.70–0.85 ms overhead recorded on an 8 GB RTX 4060 laptop GPU is measurable but modest; whether it represents a worthwhile trade-off depends entirely on whether a player’s frame rate headroom can absorb it.
The technology does not retroactively improve existing games. It requires active development effort from studios to compress assets in NTC format and integrate the decoding path. Until that ecosystem matures through major engine support, the benchmark results — however impressive — will remain largely theoretical for most players.
The potential, however, is significant enough that Tom’s Hardware’s headline called it a technology that could let 8 GB cards last another decade. Based on the data, that headline is defensible. Whether it becomes reality rests with game developers, not GPU silicon.
