How to recover btrfs

Evidence	What it means
SMART: 0 media/data-integrity errors, 0 error log entries, 5 % wear	Drive hardware is behaving normally.
NVMe logs: no aborts, resets, or PCIe AER errors shown	No transport or controller faults.
Btrfs stats: all `*_io_errs` counters at 0	The kernel never saw a write fail at the block layer.
Metadata chunk 93 % full at RO flip	Btrfs is documented to abort a transaction and remount RO when it can’t reserve metadata space.

Evidence

What it means

SMART: 0 media/data-integrity errors, 0 error log entries, 5 % wear

Drive hardware is behaving normally.

NVMe logs: no aborts, resets, or PCIe AER errors shown

No transport or controller faults.

Btrfs stats: all *_io_errs counters at 0

The kernel never saw a write fail at the block layer.

Metadata chunk 93 % full at RO flip

Btrfs is documented to abort a transaction and remount RO when it can’t reserve metadata space.

Short answer

Reboot and fix it from a live-USB; don’t try heavy surgery while the root FS is mounted read-only.

Why

All subvolumes are on the same Btrfs that’s currently ro.
You’ll need RW access to delete data and run a balance.
Doing that while the system is half-running (Docker, systemd, overlay2) risks another RO flip or a deadlock.

Minimal, safe procedure

Devices / names below match your layout. Substitute another mountpoint if you prefer.

Boot a recent Manjaro/Arch ISO (any kernel ≥ 6.8 is fine).

Unlock the LUKS container

cryptsetup open /dev/nvme0n1p2 cryptroot

Mount the filesystem RW

mount -o rw,noatime,ssd,space_cache=v2 /dev/mapper/cryptroot /mnt

Free a couple of GB fast

# Pacman cache is the quickest win (~>1 GiB)
rm -rf /mnt/var/cache/pacman/pkg/*

# If you use Timeshift/Snapper, nuke old snapshots
btrfs subvolume list /mnt | grep snapshot
btrfs subvolume delete /mnt/@.snapshots/<ID>

Rebalance metadata only
```
btrfs balance start -musage=50 -dlimit=0 /mnt
```
(-dlimit=0 → touch data chunks only if needed to make space for metadata.)

Check usage

btrfs fi df /mnt         # Metadata should drop to < 70 %

Clean exit

umount /mnt
cryptsetup close cryptroot
reboot

If you must stay online (not recommended)

sudo mount -o remount,rw,device=/dev/mapper/luks-98e65cd2-8687-44f2-bee3-64ccc6124b70,clear_cache /
# If remount succeeds:
sudo rm -rf /var/cache/pacman/pkg/*
sudo btrfs balance start -musage=50 -dlimit=0 /

Abort immediately if the filesystem flips back to RO—then fall back to the live-USB method.

After you’re back up

# keep metadata healthy
systemctl enable --now btrfs-balance.timer      # monthly light balance
systemctl enable --now [email protected]      # weekly scrub

# docker often fills metadata; prune it
docker system prune -af

Always leave at least 2 GiB free metadata (≈ 25 % of an 8 GiB DUP chunk).

jumski/chances.md

How to push certainty to 99 %

Residual risk

Bottom line

Short answer

Why

Minimal, safe procedure

If you must stay online (not recommended)

After you’re back up