NVIDIA's Neural Texture Compression Benchmarked: VRAM Use Falls 85%

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

NVIDIA Neural Texture Compression: 85% VRAM Reduction Benchmarked

GPU Technology Report · Independent Analysis

Saturday, April 12, 2026 · Graphics & Memory

GPU Technology

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

Tom’s Hardware has published the first independent multi-GPU test of RTX NTC, confirming dramatic memory savings across the RTX 50 and 40 series — though no shipping game supports the technology yet.

By Tech Desk April 12, 2026 GPU AI Rendering VRAM RTX 50 Series

85% Max VRAM reduction
(vs. BCn baseline)

6.5GB → 970MB Tuscan scene
(BCn vs. NTC)

0.09ms RTX 5090 overhead
at 4K / TAA

On April 11, 2026, Tom’s Hardware published a dedicated benchmark of NVIDIA’s RTX Neural Texture Compression — the first rigorous, multi-GPU independent test of a technology that has been demoed publicly since GTC 2026 but has yet to appear in a single shipping title. The results are striking: in controlled scenes, texture memory use fell by as much as 85 percent while image quality actually improved over the traditional BCn block compression that games have relied on for more than a decade.

The timing of the test matters. NVIDIA first demonstrated NTC at its GTC 2026 conference earlier this year, showing a Tuscan villa render scene dropping from 6.5 GB of VRAM under BCn compression to just 970 MB under NTC. The company also released the RTXNTC SDK on GitHub in early 2026. Tom’s Hardware’s independent benchmark now quantifies that promise across real hardware, from the flagship RTX 5090 down to the RTX 4060 laptop GPU — the kind of 8 GB card that has become a flashpoint in the ongoing VRAM capacity debate.

▸ Intel Sponza Scene — Texture Memory Usage Comparison

Original Lossless Reference

6,830 MB

NTC — Inference on Load (BCn)

2,041 MB

NTC — Inference on Sample

303 MB

Source: Tom’s Hardware / NVIDIA RTXNTC SDK sample (Intel Sponza scene)

How NTC Works

Neural Texture Compression is a machine learning-based method that replaces the fixed 4×4 pixel block structure of BCn compression with a small neural network decoder. During the compression phase, a material’s original texture data — up to 16 channels simultaneously — is transformed into a compact set of neural network weights and latent feature vectors. At render time, a tiny Multi-Layer Perceptron (MLP) reconstructs the required pixel data on demand, running on the GPU’s dedicated AI acceleration hardware.

NVIDIA is explicit that NTC is a deterministic decoding technology. Unlike generative AI, it does not invent or hallucinate texture detail; it faithfully reconstructs what was encoded. The process is accelerated through Cooperative Vector extensions for both Direct3D 12 and Vulkan, meaning it is not limited to NVIDIA hardware.

Inference on Sample mode is more suitable for high-performance graphics cards, while Inference on Load can cover all platform hardware.
— Alexey Panteleev, Distinguished DevTech Engineer, NVIDIA

Three Operating Modes

NTC exposes three distinct modes under DirectX 12, allowing developers — or players — to balance memory savings against performance overhead. Vulkan supports only the first two.

Mode 01 / DirectX 12 + Vulkan

Inference on Load

NTC textures are decompressed entirely within the GPU during the game or map loading phase and simultaneously transcoded to standard BCn format. Rendering performance is identical to native BCn textures — zero runtime overhead. VRAM usage during gameplay is not reduced, but disk footprint and PCIe bus transfer pressure both fall significantly.

Zero runtime overhead

Mode 02 / DirectX 12 + Vulkan

Inference on Sample

The highest-compression mode. A pre-trained MLP decodes required pixel data in real time during texture sampling, keeping textures in their compressed neural form in VRAM at all times. Tom’s Hardware measured a reduction from 6,830 MB to just 303 MB in the Intel Sponza scene — an 85% drop vs. Inference on Load, and a greater than 95% reduction vs. the uncompressed lossless reference. Image quality surpassed the BCn baseline. Requires Stochastic Texture Filtering (STF) and works best paired with DLSS.

Up to 85% VRAM savings

Mode 03 / DirectX 12 Only

Inference on Feedback

Uses DirectX 12’s Sampler Feedback feature to identify exactly which texture tiles are needed for the current rendered view, decompressing only those tiles into a sparse BCn structure. Memory savings are less extreme than Mode 02, but performance overhead is lower. Acts as an intelligent middle ground between the two above.

Balanced compromise

Benchmark Results Across GPUs

Tom’s Hardware tested NTC’s Inference on Sample mode — the most demanding but most impactful configuration — using frame-time overhead (milliseconds added per rendered frame) as the primary metric. The Intel Sponza scene was rendered with Temporal Anti-aliasing (TAA) enabled.

▸ NTC Inference on Sample — Frame Time Overhead (TAA enabled)

GPU	Resolution	Overhead (ms)	VRAM
RTX 5090	4K	+0.09 ms	32 GB
RTX 5070	1440p	+0.50 – 0.70 ms	12 GB
RTX 5060	1080p	+0.60 – 0.70 ms	8 GB
RTX 4060 (Laptop)	1080p	+0.70 – 0.85 ms	8 GB

The benchmark team noted that the test scenes included only basic forward rendering and anti-aliasing pipelines. In a real AAA game, many rendering stages are unaffected by NTC, meaning the relative overhead in actual gameplay would be proportionally smaller than these figures suggest.

Important caveat on image noise: Inference on Sample mode relies on Stochastic Texture Filtering and can introduce visible image noise when anti-aliasing is disabled. DLSS eliminates the noise entirely; TAA reduces but does not fully remove it. NVIDIA and the Tom’s Hardware team both recommend using Inference on Sample with DLSS enabled for the cleanest output.

Cross-Vendor Compatibility

A critical aspect of NTC that separates it from proprietary techniques like DLSS is its hardware-agnostic design. The Cooperative Vector extensions used by NTC’s shader-side decoder are supported by NVIDIA Tensor Cores, AMD AI Accelerators, and Intel XMX engines alike. This means game developers who integrate NTC can ship a single code path that runs on all three GPU vendors’ hardware, lowering the integration barrier considerably.

According to prominent leaker Kepler_L2, Sony may incorporate a similar or directly equivalent technology into the PlayStation 6 console, potentially using NTC to reduce game install sizes while keeping storage costs down with a 1 TB SSD. Neither Sony nor NVIDIA has officially confirmed this.

Where Things Actually Stand: No Games Yet

Despite the compelling benchmark results and the SDK’s availability on GitHub since early 2026, no commercially released game currently supports NTC. The Tom’s Hardware test was conducted against NVIDIA’s own sample application using the Intel Sponza reference scene — a developer showcase, not a retail title.

NVIDIA first showcased the Tuscan villa demo at GTC 2026, where the 6.5 GB to 970 MB reduction was presented publicly. Developer interest appears genuine; industry-wide SDK adoption is described as underway. Analysts expect major game engines such as Unreal and Unity to offer formal NTC support in the second half of 2026, which would be the prerequisite for NTC shipping inside actual titles.

Alexey Panteleev, the NVIDIA engineer who developed NTC, confirmed in the Tom’s Hardware interview that game developers can implement NTC on a per-texture basis or expose a player-facing toggle, giving users the ability to opt in based on their hardware capabilities.

What It Means for 8 GB Cards

The practical promise of NTC for existing mid-range hardware is real, but contingent. An RTX 4060 or RTX 5060 running Inference on Sample mode could, in principle, play a high-fidelity AAA game that currently exceeds its VRAM budget — provided the game has been built with NTC integration. The 0.70–0.85 ms overhead recorded on an 8 GB RTX 4060 laptop GPU is measurable but modest; whether it represents a worthwhile trade-off depends entirely on whether a player’s frame rate headroom can absorb it.

The technology does not retroactively improve existing games. It requires active development effort from studios to compress assets in NTC format and integrate the decoding path. Until that ecosystem matures through major engine support, the benchmark results — however impressive — will remain largely theoretical for most players.

The potential, however, is significant enough that Tom’s Hardware’s headline called it a technology that could let 8 GB cards last another decade. Based on the data, that headline is defensible. Whether it becomes reality rests with game developers, not GPU silicon.

NVIDIA's Neural Texture Compression Benchmarked: VRAM Use Falls 85%

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

Windows Software Alternatives in Linux

Windows-Friendly Linux

Disclaimer of pbxscience.com

Tags: Games Graphics card Nvidia

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

How NTC Works

Three Operating Modes

Benchmark Results Across GPUs

Cross-Vendor Compatibility

Where Things Actually Stand: No Games Yet

What It Means for 8 GB Cards

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

Windows Software Alternatives in Linux

More Stories

Google Slashes Play Store Fees and Opens Android to Rival Payment Systems Worldwide

Microsoft Now Says 8 GB of RAM Is Fine for Windows 11 — After Years of Pushing 16 GB

Class Action Lawsuit Filed Against Major Gas Stations for Using AI to Inflate California Fuel Prices

Google Slashes Play Store Fees and Opens Android to Rival Payment Systems Worldwide

Microsoft Now Says 8 GB of RAM Is Fine for Windows 11 — After Years of Pushing 16 GB

Class Action Lawsuit Filed Against Major Gas Stations for Using AI to Inflate California Fuel Prices

China’s LineShine Tops the World’s Supercomputer Rankings — Without a Single GPU

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

How NTC Works

Three Operating Modes

Benchmark Results Across GPUs

Cross-Vendor Compatibility

Where Things Actually Stand: No Games Yet

What It Means for 8 GB Cards

NVIDIA’s Neural Texture Compression Benchmarked: VRAM Use Falls 85%

More Stories

You may have missed