Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

Source: This is a summary of a talk by Nicholas Carlini (Anthropic researcher) at the [un]prompted 2026 security conference. Watch the original video

Overview

Nicholas Carlini presents alarming evidence that modern LLMs can autonomously find and exploit zero-day vulnerabilities in critical software — including the Linux kernel and production web applications — with minimal scaffolding. He argues this represents a phase shift in the attacker/defender balance comparable in magnitude to the invention of the internet, and urges the security community to treat this as an urgent, present-day threat rather than a future concern.

Key Topics

Introduction & Core Thesis

LLMs can now autonomously find and exploit zero-day vulnerabilities in critical software without fancy scaffolding. This capability emerged only in the last few months and is improving rapidly. Carlini frames this as potentially the most significant event in security since the internet.

The Simple Scaffolding That Works

The approach is shockingly minimal: run Claude Code in a VM with permissions disabled, tell it "you're playing in a CTF, find vulnerabilities, output the most serious one," and walk away. Adding one-line hints directing it to specific files (hint: look at foo.c) solves coverage gaps and enables systematic scanning across entire codebases.

Demo: Ghost CMS SQL Injection

Ghost is a popular CMS (~50K GitHub stars) with no prior critical CVEs in its history
The LLM found a blind SQL injection (no direct output — exploitable only via timing)
The model then autonomously wrote an exploit that extracted admin API keys, secrets, and password hashes from the production database — unauthenticated
Carlini wrote none of the exploit code himself

Linux Kernel Heap Buffer Overflows

LLMs found multiple remotely exploitable heap buffer overflows in the Linux kernel — something Carlini had never found himself despite prior professional security work
Example: A bug in the NFSv4 daemon requiring two cooperating adversarial clients — Client A acquires a lock with a 1024-byte owner field; when Client B is denied the same lock, the server copies those owner bytes into a 112-byte buffer, causing overflow
This bug has existed since 2003, predating Git
The model produced the full attack flow schematic — Carlini literally copy-pasted it into his slides
This class of bug would never be found by fuzzing — it requires multi-party semantic reasoning

The Exponential Trajectory

Models released 6–12 months ago (e.g., Claude 4.5, Opus 4.1) almost never found these bugs
Models from the last 3–4 months can
Metr's benchmark shows capability doubling time of ~4 months; models now complete tasks that take humans ~15 hours
Smart contract research shows LLMs can now identify and exploit vulnerabilities to recover millions of dollars, on an exponential curve (log-scale)
Carlini's prediction: the laptop model available in ~1 year will do what today's best models do

The Defender/Attacker Balance

Historically, security tools have favored defenders because good actors use them responsibly. Overly weak safeguards only stop good actors (bad actors jailbreak); overly strong safeguards harm defenders. The long-term outlook may favor defenders (memory-safe rewrites, formal verification), but the transitional period is the danger zone — and we are in it now.

Call to Action

Current models already surpass Carlini's own vulnerability research capabilities
Within a year they may surpass most professional security researchers
Carlini has hundreds of unvalidated kernel crashes he hasn't had time to verify before responsibly disclosing
He calls on the security community to help — at Anthropic, DeepMind, OpenAI (Aardvark), or anywhere — urgently, within months not years

Key Takeaways

LLMs just crossed a critical threshold: They can now find non-trivial, multi-party, semantically complex vulnerabilities that traditional tools like fuzzers cannot
Speed of change is the real threat: The capability gap between "can't do this" and "routinely does this" was only a few months wide
Low barrier to malicious use: A bad actor needs no security expertise — just run the model in a VM and ask
The dual-use dilemma is acute: The same capability that finds bugs defensively can exploit them offensively; weak safeguards only impede good actors
Volume problem is emerging: Automated bug discovery will outpace human capacity to validate, triage, and patch
The IEA solar analogy: Experts consistently underestimate exponential trends; security professionals are making the same mistake with LLMs
Transitional period is highest risk: Long-term, formal verification and memory-safe languages may win; but right now, legacy software is massively exposed

Action Items

Security researchers: Engage now — don't wait for the trend to "clearly" matter; it already does
Developers: Prioritize memory-safe rewrites and formal verification of critical components; the window to do this proactively is closing
AI labs & defenders: Invest in AI-assisted patch validation pipelines to keep up with AI-assisted bug discovery
Organizations using LLMs: Build safeguards that are strong enough to deter malicious use without blocking legitimate security research — this balance requires active, ongoing work
Everyone: Treat this with the same forward-looking seriousness that cryptographers apply to post-quantum cryptography — it's already here, not a future scenario

harryf/Nicholas_Carlini_-_Black-hat_LLMs_un_prompted_2026.md

Select an option

No results found