Source: This is a summary of a talk by Nicholas Carlini (Anthropic researcher) at the [un]prompted 2026 security conference. Watch the original video
Nicholas Carlini presents alarming evidence that modern LLMs can autonomously find and exploit zero-day vulnerabilities in critical software — including the Linux kernel and production web applications — with minimal scaffolding. He argues this represents a phase shift in the attacker/defender balance comparable in magnitude to the invention of the internet, and urges the security community to treat this as an urgent, present-day threat rather than a future concern.
LLMs can now autonomously find and exploit zero-day vulnerabilities in critical software without fancy scaffolding. This capability emerged only in the last few months and is improving rapidly. Carlini frames this as potentially the most significant event in security since the internet.
The approach is shockingly minimal: run Claude Code in a VM with permissions disabled, tell it "you're playing in a CTF, find vulnerabilities, output the most serious one," and walk away. Adding one-line hints directing it to specific files (hint: look at foo.c) solves coverage gaps and enables systematic scanning across entire codebases.
- Ghost is a popular CMS (~50K GitHub stars) with no prior critical CVEs in its history
- The LLM found a blind SQL injection (no direct output — exploitable only via timing)
- The model then autonomously wrote an exploit that extracted admin API keys, secrets, and password hashes from the production database — unauthenticated
- Carlini wrote none of the exploit code himself
- LLMs found multiple remotely exploitable heap buffer overflows in the Linux kernel — something Carlini had never found himself despite prior professional security work
- Example: A bug in the NFSv4 daemon requiring two cooperating adversarial clients — Client A acquires a lock with a 1024-byte owner field; when Client B is denied the same lock, the server copies those owner bytes into a 112-byte buffer, causing overflow
- This bug has existed since 2003, predating Git
- The model produced the full attack flow schematic — Carlini literally copy-pasted it into his slides
- This class of bug would never be found by fuzzing — it requires multi-party semantic reasoning
- Models released 6–12 months ago (e.g., Claude 4.5, Opus 4.1) almost never found these bugs
- Models from the last 3–4 months can
- Metr's benchmark shows capability doubling time of ~4 months; models now complete tasks that take humans ~15 hours
- Smart contract research shows LLMs can now identify and exploit vulnerabilities to recover millions of dollars, on an exponential curve (log-scale)
- Carlini's prediction: the laptop model available in ~1 year will do what today's best models do
Historically, security tools have favored defenders because good actors use them responsibly. Overly weak safeguards only stop good actors (bad actors jailbreak); overly strong safeguards harm defenders. The long-term outlook may favor defenders (memory-safe rewrites, formal verification), but the transitional period is the danger zone — and we are in it now.
- Current models already surpass Carlini's own vulnerability research capabilities
- Within a year they may surpass most professional security researchers
- Carlini has hundreds of unvalidated kernel crashes he hasn't had time to verify before responsibly disclosing
- He calls on the security community to help — at Anthropic, DeepMind, OpenAI (Aardvark), or anywhere — urgently, within months not years
- LLMs just crossed a critical threshold: They can now find non-trivial, multi-party, semantically complex vulnerabilities that traditional tools like fuzzers cannot
- Speed of change is the real threat: The capability gap between "can't do this" and "routinely does this" was only a few months wide
- Low barrier to malicious use: A bad actor needs no security expertise — just run the model in a VM and ask
- The dual-use dilemma is acute: The same capability that finds bugs defensively can exploit them offensively; weak safeguards only impede good actors
- Volume problem is emerging: Automated bug discovery will outpace human capacity to validate, triage, and patch
- The IEA solar analogy: Experts consistently underestimate exponential trends; security professionals are making the same mistake with LLMs
- Transitional period is highest risk: Long-term, formal verification and memory-safe languages may win; but right now, legacy software is massively exposed
- Security researchers: Engage now — don't wait for the trend to "clearly" matter; it already does
- Developers: Prioritize memory-safe rewrites and formal verification of critical components; the window to do this proactively is closing
- AI labs & defenders: Invest in AI-assisted patch validation pipelines to keep up with AI-assisted bug discovery
- Organizations using LLMs: Build safeguards that are strong enough to deter malicious use without blocking legitimate security research — this balance requires active, ongoing work
- Everyone: Treat this with the same forward-looking seriousness that cryptographers apply to post-quantum cryptography — it's already here, not a future scenario