Claude Code Unearths a Linux Vulnerability Hidden for 23 Years
Claude Code Unearths a Linux Vulnerability Hidden for 23 Years
- 60% of MD5 Password Hashes Can Be Cracked in Under an Hour with a Single GPU
- Dirty Frag: Root Access on Every Major Linux Distribution — No Patch, No Warning
- Ubuntu 26.04 LTS (Resolute Raccoon): The Most Ambitious Ubuntu LTS in a Decade
- Proton Mail: Data Transferred to FBI Again!
- How Close Are Quantum Computers to Breaking RSA-2048?
- How to Prevent Ransomware Infection Risks?
- What is the best alternative to Microsoft Office?
Claude Code Unearths a Linux Vulnerability Hidden for 23 Years
Nicholas Carlini is not a name most people recognize. But in security research circles, his credentials speak clearly: a research scientist at Anthropic, a PhD from UC Berkeley under David Wagner, best-paper awards at IEEE S&P, USENIX Security (twice), and ICML (three times)—and, as of this writing, more than 70,000 citations on Google Scholar. He spent years at Google Brain and DeepMind before joining Anthropic. At the [un]prompted 2026 AI security conference, he told the audience something that stopped the room:
We now have a number of remotely exploitable heap buffer overflows in the Linux kernel. I have never found one of these in my life before. This is very, very, very hard to do. With these language models, I have a bunch.
— Nicholas Carlini, [un]prompted AI Security Conference, 2026A security professional who had spent decades in the field admitted that what he had never managed to accomplish alone, AI had done—not once, but repeatedly. The most striking example: a vulnerability buried inside the Linux kernel since March 2003, invisible to every code review, static analysis tool, and fuzzing campaign that had touched the kernel in the 23 years since.
A Landmine Planted in 2003
The flaw lives in NFS—the Network File System, the foundational protocol Linux uses to share files across servers. It is present in virtually every enterprise Linux deployment. The specific vulnerability sits in NFS’s locking mechanism, and while the attack path requires coordination, the underlying principle can be stated simply: the server tries to pour 1,056 bytes of water into a 112-byte cup.
In technical terms, the attack works like this: two cooperating NFS clients target a Linux NFS server.
Client A acquires a file lock and declares a 1,024-byte owner ID—unusual in length, but fully permitted
by the NFSv4 protocol. When Client B then requests the same lock, the server rejects it. Generating
that rejection response, the server attempts to include the full 1,024-byte owner ID. But the allocated
memory buffer—NFSD4_REPLAY_ISIZE—is only 112 bytes. The result: 944 bytes of kernel memory
are overwritten. Because the attacker controls the owner ID, they also control what gets written there.
No login credentials. No special privileges. Just network access to an exposed NFS service.
This is a remotely exploitable heap buffer overflow—one of the most severe vulnerability classes in systems security. It was assigned CVE-2026-31402 and has since been patched in the Linux kernel.
The original commit, from March 2003, explains the developer’s thinking at the time: the 112-byte static
buffer was explicitly sized for the OPEN operation—the largest of the NFSv4 sequence mutation
operations at the time of writing. The reasoning was sound. What the developer did not—could not—anticipate
was that the LOCK operation added later would permit owner IDs of up to 1,024 bytes. A perfectly
logical design choice at the protocol layer became a fatal flaw at the implementation layer. For 23 years,
it waited.
Why Existing Tools Missed It for Two Decades
The more important question is not how the bug was introduced, but why it survived so long. The Linux kernel has more than 30 million lines of code and receives constant security scrutiny. Traditional vulnerability discovery relies on three primary approaches—and all three fail against this class of bug.
Static analysis can flag a buffer that appears undersized, but it cannot understand the semantics of the NFS protocol. It has no way to know that under specific interaction conditions—a LOCK denial following an OPEN with an unusually long owner ID—the 112-byte buffer would be called upon to hold 1,056 bytes. The code, viewed in isolation, looks correct.
Fuzzing throws random data at a system and watches for crashes. But triggering this vulnerability requires a precise, ordered sequence: two clients, operating in a specific choreography, exchanging protocol messages in the right order. The probability of random fuzzing stumbling onto that exact combination is vanishingly small.
Manual auditing is limited by human bandwidth. Even experienced kernel security researchers have finite time and attention; the NFS implementation details are dense and easy to skim past. No one connected the dots between the OPEN buffer size decision made in 2003 and the LOCK operation’s owner ID capacity added afterward.
Claude Code found it because it could reason about the entire interaction flow: the protocol handshake, the state relationship between OPEN and LOCK, and the length variation of the owner ID across different operation contexts. It assembled those pieces into a coherent picture of the gap between 112 and 1,056.
What’s most surprising about the vulnerability Carlini shared is how little oversight Claude Code needed to find the bug. He essentially just pointed Claude Code at the Linux kernel source code and asked: “Where are the security vulnerabilities?”
— Michael Lynch, mtlynch.io detailed breakdown, April 2026
Carlini’s approach was deliberately minimal. He wrote a bash script that iterated over every source file
in the Linux kernel and, for each file, told Claude Code it was in a CTF (capture-the-flag) competition
and should look for vulnerabilities. No protocol documentation. No hand-crafted hints. A find
command looping into a claude call. Claude returned complete vulnerability reports—including
ASCII diagrams of the attack chain.
A Leap, Not a Step
Carlini tested the identical workflow against earlier models. Claude Opus 4.1, released roughly eight months before the conference, and Claude Sonnet 4.5, released six months prior, found only a small fraction of the vulnerabilities that Claude Opus 4.6 identified. This was not incremental improvement—it was a qualitative jump.
Greg Kroah-Hartman, lead maintainer of the Linux kernel, observed the same inflection point from the other side. For months, AI-generated security reports arriving in kernel maintainers’ inboxes were largely noise—developers dismissed them as “AI slop.” Then, roughly a month before Carlini’s talk, something changed. Reports became legitimate. The kernel security lists began receiving five to ten valid, actionable reports per day.
The trend extends beyond the Linux kernel and beyond Claude. Security researcher Sean Heelan used
OpenAI’s o3 model to analyze approximately 12,000 lines of SMB command handler code in the Linux kernel
and discovered a use-after-free vulnerability in the ksmbd module—a race condition in the
SMB2 LOGOFF handler (CVE-2025-37899) that traditional static analysis had consistently missed, because
it required reasoning about concurrent thread access patterns rather than simple code inspection.
Mozilla Firefox—perhaps the most rigorously audited open-source browser on Earth—became the subject of a two-week collaboration between Anthropic and Mozilla in February 2026. Claude Opus 4.6 scanned nearly 6,000 C++ files, submitted 112 unique reports to Mozilla’s Bugzilla tracker, and identified 22 confirmed vulnerabilities, including 14 classified as high-severity. The first vulnerability—a use-after-free in the JavaScript engine—was flagged within 20 minutes of the scan beginning. Mozilla shipped fixes for the majority of these flaws in Firefox 148, protecting hundreds of millions of users. Those 14 high-severity findings represent nearly a fifth of all high-severity Firefox bugs that were remediated across the entire year of 2025.
The Bottleneck Has Moved
Carlini articulated the new problem during his talk with characteristic candor: he has hundreds of unverified kernel crash reports sitting on his desk. He cannot send them to Linux maintainers without first validating each one himself—but verification takes human time he does not have. The speed at which AI finds vulnerabilities has now outpaced the speed at which humans can review them.
This is historically new. The bottleneck in security research has never before been review capacity rather than discovery capacity.
The implications cut in both directions. For defenders—open-source maintainers, security teams, platform operators—the window between a vulnerability’s introduction and its discovery is collapsing. Bugs that might have lurked for a decade can now be found in days or weeks. For attackers wielding the same tools, the same acceleration applies. Carlini himself acknowledged this without equivocation.
What it means for ordinary users in the near term is better-secured operating systems and software, as vulnerabilities are found and patched before exploitation. What it means in a longer timeframe depends on whether the defensive use of these tools outpaces the offensive use—a race that is, at this moment, genuinely unresolved.
I expect to see an enormous wave of security bugs uncovered in the coming months.
— Nicholas Carlini, closing remarks, [un]prompted 2026The wave is already visible. A 23-year-old vulnerability in the protocol that underlies enterprise Linux file sharing was found not by a decade of manual auditing, but by a script, a model, and the instruction to look. The security field is experiencing its own AlphaGo moment—except, unlike Go, the board has 30 million lines of code, and both sides are playing.
