AMD & Intel Jointly Release ACE White Paper: Promising a 16× AI Performance Leap on x86 CPUs
AMD & Intel Jointly Release ACE White Paper:
Promising a 16× AI Performance Leap on x86 CPUs
- 60% of MD5 Password Hashes Can Be Cracked in Under an Hour with a Single GPU
- Dirty Frag: Root Access on Every Major Linux Distribution — No Patch, No Warning
- Ubuntu 26.04 LTS (Resolute Raccoon): The Most Ambitious Ubuntu LTS in a Decade
- Proton Mail: Data Transferred to FBI Again!
- How Close Are Quantum Computers to Breaking RSA-2048?
- How to Prevent Ransomware Infection Risks?
- What is the best alternative to Microsoft Office?
AMD & Intel Jointly Release ACE White Paper,
Promising a 16× AI Performance Leap on x86 CPUs
The AI Compute Extensions (ACE) specification—co-authored by engineers from both companies under the x86 Ecosystem Advisory Group—brings standardized, outer-product-based matrix acceleration to every x86 chip, from laptops to supercomputers.
AMD and Intel, longtime rivals at the heart of the x86 ecosystem, have jointly published the official white paper for ACE — AI Compute Extensions for the x86 instruction set architecture (ISA). Released on April 15, 2026 under the banner of the x86 Ecosystem Advisory Group (EAG), the specification represents the most ambitious collaborative standardization effort between the two companies in recent memory, targeting a fundamental bottleneck in on-CPU AI inference: matrix multiplication throughput.
The ACE white paper was co-authored by engineers from both AMD (Stuart Biles, Brian Thompto, Michael Estlick, Eric Schwarz, Thomas Fox, Gabriel Loh, Marius Evers, and Michael Clark) and Intel (Alexander Heinecke, Pradeep Dubey, and Ido Ouziel), underscoring the depth of the collaboration.
over equivalent AVX10
published Apr 15, 2026
natively supported
Background: The x86 Ecosystem Advisory Group
The seeds of ACE were planted in October 2024 when AMD and Intel co-founded the x86 Ecosystem Advisory Group (EAG), a consortium designed to coordinate the future evolution of the x86 ISA in the face of growing competitive pressure from Arm and RISC-V. At its one-year anniversary in October 2025, the EAG announced tangible progress on four key initiatives:
- FRED — Flexible Return and Event Delivery: a modernized interrupt model to reduce latency and improve software reliability.
- AVX10 — Standardization of 512-bit SIMD extensions across both vendors, ending the fragmentation that plagued AVX-512.
- ChkTag — A unified x86 memory-tagging specification to combat buffer overflows and use-after-free vulnerabilities at the hardware level.
- ACE — AI Compute Extensions for matrix multiplication, the subject of this white paper.
“ACE standardizes matrix multiplication capabilities, enabling seamless developer experiences across devices ranging from laptops to data center servers.”
— AMD x86 Ecosystem Advisory Group Blog, October 2025
What ACE Does: Outer Products and Tile Registers
Matrix multiplication is the foundational computational primitive behind neural networks and large language models. Existing SIMD approaches — such as AVX10 — can perform matrix operations but operate on one-dimensional data, making them architecturally ill-suited to the two-dimensional nature of matrix workloads. The result is limited computational density and poor scalability for AI inference tasks.
ACE addresses this by introducing a matrix acceleration mechanism based on outer-product operations, paired with two-dimensional tile registers. This architectural shift is directly analogous to the Tensor Core design that has driven NVIDIA GPU performance in AI workloads — only now integrated directly into every x86 CPU core.
The headline result: ACE achieves 16× the computational density of an equivalent AVX10 multiply-accumulate operation while consuming the same input vectors. This is not a theoretical peak — it reflects the structural efficiency gain from switching to an outer-product, tiled computation model.
| Format | Standard | Primary Use Case |
|---|---|---|
| INT8 | Legacy / Widely adopted | Quantized inference, edge AI |
| OCP FP8 | Open Compute Project | LLM inference, mixed precision training |
| OCP MXFP8 | Open Compute Project (microscaling) | Efficient LLM inference with fine-grained scaling |
| OCP MXINT8 | Open Compute Project (microscaling) | Quantized inference with per-block scales |
| BF16 | Brain Float 16 (Google / industry) | Training and high-quality inference |
The selection of OCP MXFP8 and OCP MXINT8 — the microscaling formats pioneered by the Open Compute Project — is particularly significant. These formats, which apply scaling factors at a per-block granularity rather than per-tensor, are increasingly considered the optimal balance between accuracy and compute efficiency for LLM inference, and their inclusion signals ACE’s positioning as a forward-looking, production-grade specification.
Integration with AVX10: A Seamless Extension
Rather than standing alone as a separate instruction set, ACE is designed as a seamless extension of AVX10. This architectural choice is deliberate and consequential. One of the biggest lessons from the AVX-512 era was the danger of instruction fragmentation: AVX-512 was not available on all CPUs, and certain sub-extensions existed only on specific Intel server parts. Software developers were reluctant to target it, fearing they’d exclude large portions of their user base.
With ACE co-authored and co-standardized by both AMD and Intel, and layered on top of the already-unified AVX10 baseline, the ecosystem should be able to adopt ACE instructions with confidence that support will be universal across both vendors’ hardware going forward.
“By co-authoring ACE, Intel and AMD have established a consistent standard across the entire x86 stack — from laptops to supercomputers — closing the chapter on instruction fragmentation.”
— Tweaktown Analysis, April 2026
Software Ecosystem: Integration Already Underway
The white paper confirms that software enablement for ACE is actively in progress. The integration roadmap spans the full computational stack:
Low-Level Libraries
Deep learning and HPC kernel libraries are being updated to incorporate ACE-accelerated primitives, including lower-precision GEMM (General Matrix Multiply) routines and LLM-specific compute kernels. These libraries form the backbone of nearly all production AI inference pipelines.
Scientific Python Ecosystem
Integration work has begun with NumPy and SciPy, the foundational Python libraries for numerical computation. This ensures that scientists and data analysts working in Python can benefit from ACE acceleration without any code changes.
Machine Learning Frameworks
Support is being added to PyTorch and TensorFlow, the two dominant frameworks for training and deploying neural networks. Once integrated, models running on ACE-capable hardware should be able to dispatch matrix operations to ACE automatically through existing APIs.
Key Timeline
AMD and Intel co-found the EAG alongside major industry partners to jointly govern the future of x86, announcing FRED, AVX10, ChkTag, and ACE as its four core initiatives.
At the group’s one-year milestone, AMD and Intel confirm ACE has been accepted and is being implemented. AVX10 fragmentation concerns are resolved with Intel re-committing to 512-bit SIMD width.
The formal ACE specification white paper is released, co-authored by engineers from AMD and Intel. The paper details the outer-product architecture, data format support, and 16× compute density claim over AVX10.
Library and framework support underway for NumPy, SciPy, PyTorch, TensorFlow, and deep learning HPC kernels. No ACE-capable CPUs have been announced yet; hardware support is expected in future CPU generations.
Implications: CPU vs. Dedicated Accelerator
One of ACE’s most strategically interesting aspects is its positioning relative to dedicated AI accelerators. The conventional wisdom of the past several years has been that serious AI compute — particularly inference of large language models — must migrate to discrete GPUs or NPUs, requiring developers to master CUDA, ROCm, or proprietary accelerator stacks. This migration carries significant software engineering costs.
ACE takes a different position. By integrating high-density matrix acceleration directly into the CPU’s ISA — available wherever the CPU is — it enables lightweight AI workloads to run on the CPU itself, using standard software libraries, without engaging a separate power-hungry accelerator. The x86 ABI remains intact; developers do not need to rewrite code for different hardware platforms.
This does not mean ACE competes with high-end discrete GPU accelerators for large model training or frontier inference. Rather, it targets the vast middle ground: on-device AI inference for applications running on laptops, workstations, edge servers, and CPUs in data centers where a GPU is either unavailable or inefficient to power up for a small task.
What’s Next
No CPUs with hardware ACE support have been announced as of the white paper’s publication date. The release of the specification to the developer community is the first step: establishing the standard before silicon arrives, so that software ecosystems are ready the moment hardware ships. Given that AMD’s next-generation Zen 6-based EPYC “Venice” CPUs are expected in the second half of 2026, and Intel continues developing its roadmap, ACE support could conceivably appear in silicon within the next one to two years.
The ChkTag memory tagging specification — the remaining EAG initiative without a published full spec — is also expected to be released later in 2026, rounding out the EAG’s foundational four-feature platform for the next generation of x86 computing.
Sources: AMD x86 EAG Blog (Oct. 2025), ACE White Paper v1.0 — x86ecosystem.org (Apr. 15, 2026), Tweaktown (Apr. 30, 2026), Wccftech (Apr. 29, 2026), TechPowerUp (Oct. 2025), HWCooling.net (Oct. 2025).
