Google’s New AI Breaks a 56-Year-Old Unsolved Math Problem

Last year, AI stunned the world by solving PhD-level mathematics problems. This year, it’s moved even further—cracking unsolved mathematical mysteries that have baffled experts for decades.

On May 15, Google DeepMind unveiled AlphaEvolve, a groundbreaking programming agent unlike any before. Instead of focusing on specific tasks, AlphaEvolve is built to autonomously discover and continuously refine general-purpose algorithms.

As its name suggests, AlphaEvolve thrives on evolution. It mimics natural selection, iterating through code to evolve increasingly optimized and innovative algorithms.

If a problem can be expressed as code and its outcomes measured through objective functions, AlphaEvolve can iteratively improve upon it. According to Google, after testing the system on more than 50 open problems in areas like combinatorics, geometry, and number theory, AlphaEvolve outperformed existing human-devised solutions in around 20% of cases.

DeepMind researcher Matej Balog shared a major highlight: AlphaEvolve has made the first breakthrough in 56 years on the 4×4 complex matrix multiplication problem, improving upon a long-standing algorithmic barrier. This advancement emerged from a complex search technique the AI developed on its own.

But the system’s value goes far beyond mathematics—it represents a general-purpose capability for autonomous algorithm discovery. “We’ve only just scratched the surface of what AlphaEvolve can do,” Balog said.

Breaking Barriers in AI: NVIDIA’s OmniVinci Model Integrates Vision Audio and Text

1. If a Problem Can Be Programmed and Evaluated, AlphaEvolve Can Improve It

At its core, AlphaEvolve is driven by a self-improving evolutionary mechanism that automatically generates and refines algorithms to maximize performance.

Fundamentally, it addresses a black-box optimization problem: maximize h(f), where f is a program generated by a large language model, and h is an evaluation function that measures the program’s quality.

AlphaEvolve starts by using a prompt sampler to construct inputs that guide language models to generate code. Two versions of Google’s Gemini models work in tandem:

Gemini Flash quickly generates a wide range of program candidates.
Gemini Pro provides deeper, more structured algorithmic suggestions.

This combination allows AlphaEvolve to produce not only feasible programs but also ones with meaningful algorithmic depth. The generated code is fed into an automated evaluation system, which verifies, runs, scores, and stores the results in a program database.

From there, an evolutionary algorithm selects the top-performing programs to guide the next generation of prompts—allowing the system to steadily evolve better solutions.

A key to this process is AlphaEvolve’s ability to automatically assess candidate programs using the h function, which measures accuracy, performance efficiency, code quality, and more. These metrics are quantitative and objective, enabling AlphaEvolve to optimize autonomously—without human-in-the-loop intervention.

That said, the h function itself still requires human definition. It might include metrics like correctness, runtime, or readability. AlphaEvolve’s task is then to optimize the f, the program, under these constraints. This means its current application is limited to problems where performance can be quantified and measured programmatically. For tasks that require human judgment or experimentation to assess results, AlphaEvolve remains out of reach.

In the historic 4×4 complex matrix multiplication example, researchers specified optimization goals such as the minimum number of multiplications (i.e., the rank of a tensor decomposition) and the proportion of random seeds that achieved these results.

These goals shaped the fitness function h, guiding AlphaEvolve through a complex search space. Starting from basic definitions, using gradient-based optimization techniques (e.g., initialization strategies, loss functions, Adam optimizer), it evolved a set of high-quality tensor decomposition algorithms.

The outcome: AlphaEvolve surpassed the best-known results for 14 different matrix multiplication structures. Most notably, it devised the first known algorithm to multiply two 4×4 complex matrices using just 48 multiplications—breaking a record that had stood untouched for 56 years.

2. Evolving Beyond Function Tweaks: AlphaEvolve as an Algorithmic Partner

AlphaEvolve builds on the foundation laid by DeepMind’s earlier system, FunSearch, which also used language models to evolve code and was applied to discover mathematical structures and online strategies.

However, AlphaEvolve is significantly more powerful and flexible. It can edit entire programs, manage multiple functions and components, and even optimize across different programming languages. It’s no longer just a tool for tweaking individual functions—it acts more like a co-designer of large, evolving algorithmic systems.

According to DeepMind, AlphaEvolve has already tackled more than 50 mathematical problems across analysis, geometry, combinatorics, and number theory. Most experiments complete within a few hours.

In about 75% of cases, it successfully re-discovered the best-known solutions. More impressively, in around 20% of the problems, it surpassed existing methods. One standout case: the kissing number problem, a 300-year-old question in mathematics. In 11-dimensional space, AlphaEvolve constructed a novel configuration of 593 spheres, improving the known lower bound.

Beyond mathematics, AlphaEvolve is already being used within Google to solve real-world engineering challenges, such as:

Designing new scheduling heuristics for the Borg cluster management system
Optimizing matrix multiplication kernels used in training large language models
Redesigning arithmetic circuits within TPU chips
Speeding up the execution of Transformer attention mechanisms

Despite the diversity of these tasks, they all share two traits: they can be expressed as code, and their outcomes can be evaluated quantitatively. Whenever those conditions are met, AlphaEvolve can go to work.

Why China Is Phasing Out Foreign AI Chips Despite Their Higher Compute Power?

3. Toward Infinite Algorithmic Intelligence

DeepMind, known for landmark AI systems like AlphaGo and AlphaFold, has once again pushed the boundaries of artificial intelligence—this time, into the domain of algorithmic discovery.

Unlike AlphaGo, which relied on reinforcement learning and self-play to master board games, AlphaEvolve uses language models to generate a wide range of algorithm candidates. These are automatically evaluated and evolved, yielding increasingly optimal solutions.

In this sense, AlphaEvolve is not just another AI agent—it’s a scalable, orchestratable framework for constructing and discovering algorithms. Whereas AlphaGo demonstrated how AI could surpass humans in well-defined games, AlphaEvolve reveals something more fundamental: the potential for AI to continuously generate and refine algorithms at scale.

Algorithm design and tuning have long been elite, highly compensated skills. AlphaEvolve challenges this paradigm by automating what was once manual, non-replicable work. With enough computing power and a well-defined evaluation metric, it can explore vast algorithmic landscapes at a speed and scale no human team could match.

Historian Yuval Harari once likened AlphaGo’s impact to discovering that humanity had only been exploring a small island on a vast Go planet. With AlphaEvolve, that metaphor extends to science itself—revealing how much of the intellectual universe remains unmapped.

In 2024, AI-driven discoveries earned Nobel Prizes in both Physics and Chemistry. Such milestones may soon become commonplace.

AlphaEvolve represents a bold new step: accelerating the expansion of humanity’s “map of the unknown” with virtually limitless intelligent exploration.

When algorithms are no longer scarce or handcrafted, the frontier of human knowledge may be just beginning.

Google’s New AI Breaks a 56-Year-Old Unsolved Math Problem