Recoverable failures for AI coding agents

AI coding agents are useful precisely because they can run tools, edit many files, execute tests, install dependencies, and iterate quickly. That same ability makes them risky in YOLO mode: a mistaken command, broad glob, broken script, or overconfident refactor can damage a working tree faster than a human can react.

The goal is not to make agents harmless. The goal is to make common failures recoverable.

Layers of protection

The proposed agentic setup has three layers:

Git commits       protect intentional source history
trash-backed rm   protects ordinary accidental deletes
Btrfs snapshots   protect deletes, overwrites, generated damage, and bad runs

These layers cover different failure modes. Git is excellent for source history, but it does not protect ignored files, untracked generated state, local config, or the repository metadata itself. Trash-backed rm helps with deletion, but not with overwrites. Btrfs snapshots cover the whole subvolume state at a point in time.

This post focuses on the Btrfs snapshot layer: making bad AI-agent runs recoverable as filesystem transactions. The trash-backed rm layer is a separate defense for accidental deletion; see Safe rm defaults for agent-heavy Linux machines.

The model

Treat agent work as a controlled filesystem transaction:

create a cheap snapshot
let the agent work
inspect the result
keep it, diff it, or roll it back

This is the same basic idea behind several AI-agent sandbox approaches: give the agent real tools, but run those tools in a filesystem layer that can be inspected or discarded.

Examples of related work and discussion:

Setup

The machine uses:

LVM logical volume
  btrfs filesystem (subvolid=5, flat layout)
    ext2_saved        ← btrfs-convert artifact, can be deleted once stable
    @agent_workflow

@agent_workflow is the important part. It is a separate Btrfs subvolume mounted at:

/home/martin/bin/lib/agent_workflow

Keeping agent_workflow as its own subvolume means it can be snapshotted and rolled back independently from the rest of $HOME.

Verify the mount exactly, not just the nearest parent mount:

findmnt -rn -M /home/martin/bin/lib/agent_workflow
sudo btrfs subvolume show /home/martin/bin/lib/agent_workflow

This matters because findmnt --target can return / when the directory is not actually a mount point. The protected directory should show btrfs, and btrfs subvolume show should succeed.

Snapshot tooling

We use Snapper on top of Btrfs:

sudo apt install btrfs-progs snapper
sudo snapper -c agent_workflow create-config /home/martin/bin/lib/agent_workflow
sudo chown martin:martin /home/martin/bin/lib/agent_workflow

Do not recursively chown the whole subvolume after creating the Snapper configuration. Snapper keeps its metadata in .snapshots, and that directory must remain owned by root. Changing the owner of .snapshots makes snapshot creation fail with:

IO Error (.snapshots must have owner root).

Before an agent run:

PRE=$(sudo snapper -c agent_workflow create --print-number --description "before yolo agent run")

After a useful result:

POST=$(sudo snapper -c agent_workflow create --print-number --description "after successful agent run")

Inspect:

sudo snapper -c agent_workflow list
sudo snapper -c agent_workflow status PRE..POST
sudo snapper -c agent_workflow diff PRE..POST

If the current run is bad and no post-run snapshot was created, compare or undo against the live filesystem as snapshot 0:

sudo snapper -c agent_workflow status "$PRE..0"
sudo snapper -c agent_workflow diff "$PRE..0"
sudo snapper -c agent_workflow undochange "$PRE..0"

If a post-run snapshot was created and the live filesystem still matches it, PRE..POST is also usable:

sudo snapper -c agent_workflow undochange "$PRE..$POST"

In testing, undochange restored deleted files, reverted overwritten files, and removed newly created files.

CLI

tools/agent-run does the following:

verify it is running inside the protected agent_workflow subvolume
create a Snapper snapshot
print the snapshot id
run the agent command
print the compare and rollback commands

The CLI refuses to run if the snapshot cannot be created. That matters: the safety mechanism has to be automatic, because YOLO mode is exactly when humans are least likely to remember manual precautions.

The mount check uses findmnt -rn -T "$PWD" against the nearest mount, then asserts that the target is /home/martin/bin/lib/agent_workflow and the filesystem type is btrfs.

Example:

cd /home/martin/bin/lib/agent_workflow
agent-run claude --dangerously-skip-permissions

On a bad run, roll back with the commands printed at exit:

sudo snapper -c agent_workflow undochange 3..0

What this does not solve

Remaining risks:

network exfiltration
writes outside the protected subvolume
credential access
destructive commands run with elevated privileges
snapshot deletion by a process with enough permission

URL: https://gist.github.com/monperrus/a7aa344dc84c76e5ec569a646b31eab9

monperrus/post-ai-agent-recoverable-failures.md

Select an option

No results found