The object of the present quest is to create a long-term archive of the tens of terabytes of media I've hoarded^Wcollected over the past 20 years. The said files are stored on a NAS that's currently running FreeBSD 13, and practically never change once they've been created.
In the unfortunate event of all my media going up in a puff of smoke, in lieu of restoring from an archive, I could always re-rip everything, but that's an endeavour I'd rather avoid altogether.
Please do note that an archive is different from a backup. The former can be used for disaster recovery in an "oh no, my house has just gone up in flames" type of situation, while the latter is intended for an "oh no, I just fat-fingered that rm
command" kind of scenario.
Optical media and magnetic tape are both great for archival use, while backups can be conveniently performed using filesystem snapshots.
Given its relatively high cost of entry, tape only becomes a cost-effective option when the amount of data you have to archive is in the tens of terabytes – the exact number will depend on the generation of hardware you decide to go with; older LTO generations will generally be cheaper to buy into. Before you hit those numbers you should be exploring other options, such as BD-R or cold cloud storage.
My work stands on the shoulders of fearless adventurers that have gone before me, such as Frederick King and Dark. However, most of the posts I've come across have discussed tools that are available for Linux, while I'm more of a BSD person. This post is, in part, an attempt to rectify that shortcoming.
Because this article will be concentrating on the software side of things, some familiarity with LTO technology will be assumed. In case you're not up to speed, you can see this blog post for a basic primer.
The drive I'm using is an external HPE LTO-5 SAS model, which struck a sweet spot in terms of both price and capacity. The LTO-5 standard (from 2010) may be getting a bit long in the tooth by 2024, but it's still more than adequate for a hobbyist/homelabber such as myself.
Moreover, a tape library/autoloader – while undoubtedly a very nice bit of kit – is out of scope for the present exercise since I neither have the budget nor the space for one.
The hardware and media I'm using were bought new (the drive was on clearance and the cartridges are all NOS off eBay), but the TCO could be considerably lowered by using used and/or refurbished hardware. This, however, is a gamble I'm unwilling to take. You might, of course, feel otherwise.
To kick things off, set the TAPE
environment variable to point to your tape drive (my drive is /dev/sa0
, use camcontrol devlist
to find yours). The example assumes you're using the Bash shell.
This will spare you the trouble of having to give the device name as an argument to most of the tape-related utilities you'll be using later.
# echo 'export TAPE=/dev/sa0' >> /usr/local/etc/profile
Then use the mt(1) command from base to print out status information about the drive.
# mt status -v
Drive: sa0: <HP Ultrium 5-SCSI Z6ED> Serial Number: xxxxxxxxxx
---------------------------------
Mode Density Blocksize bpi Compression
Current: 0x58:LTO-5 variable 384607 enabled (0x1)
---------------------------------
Current Driver State: at rest.
---------------------------------
Partition: 0 Calc File Number: 0 Calc Record Number: 0
Residual: 0 Reported File Number: 0 Reported Record Number: 0
Flags: BOP
---------------------------------
Tape I/O parameters:
Maximum I/O size allowed by driver and controller (maxio): 1048576 bytes
Maximum I/O size reported by controller (cpi_maxio): 4722688 bytes
Maximum block size supported by tape drive and media (max_blk): 16777215 bytes
Minimum block size supported by tape drive and media (min_blk): 1 bytes
Block granularity supported by tape drive and media (blk_gran): 0 bytes
Maximum possible I/O size (max_effective_iosize): 1048576 bytes
From the output, we can see that the tape is at the beginning of the partition (BOP
) and that compression is enabled. Also take a note of the maximum possible I/O size (1 MB) which we'll need later on.
The tapeinfo(1) command can also be used to show information about the drive. Note that we're using the passthrough device name (pass5
) instead of the actual device name (sa0
).
# tapeinfo -f /dev/pass5
Product Type: Tape Drive
Vendor ID: 'HP '
Product ID: 'Ultrium 5-SCSI '
Revision: 'Z6ED'
Attached Changer API: No
SerialNumber: 'xxxxxxxxxx'
MinBlock: 1
MaxBlock: 16777215
Ready: yes
BufferedMode: yes
Medium Type: Not Loaded
Density Code: 0x58
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x1
DeCompType: 0x1
BOP: yes
Block Position: 0
Partition 0 Remaining Kbytes: 1470031
Partition 0 Size in Kbytes: 1470031
ActivePartition: 0
EarlyWarningSize: 0
NumPartitions: 0
MaxPartitions: 1
The camcontrol attrib
subcommand will show detailed information about both the drive and the media.
# camcontrol attrib sa0 -r attr_values
Remaining Capacity in Partition (0x0000)[8](RO): 6466 MB
Maximum Capacity in Partition (0x0001)[8](RO): 1470031 MB
TapeAlert Flags (0x0002)[8](RO): 0x0
Load Count (0x0003)[8](RO): 10
MAM Space Remaining (0x0004)[8](RO): 1014 bytes
Assigning Organization (0x0005)[8](RO): LTO-CVE
Format Density Code (0x0006)[1](RO): 0x58
Initialization Count (0x0007)[2](RO): 1
Volume Identifier (0x0008)[0](RO):
Volume Change Reference (0x0009)[4](RO): 0x185
Device Vendor/Serial at Last Load (0x020a)[40](RO): HP xxxxxxxxxx
Device Vendor/Serial at Last Load - 1 (0x020b)[40](RO): HP xxxxxxxxxx
Device Vendor/Serial at Last Load - 2 (0x020c)[40](RO): HP xxxxxxxxxx
Device Vendor/Serial at Last Load - 3 (0x020d)[40](RO): HP xxxxxxxxxx
Total MB Written in Medium Life (0x0220)[8](RO): 5925261 MB
Total MB Read in Medium Life (0x0221)[8](RO): 5913018 MB
Total MB Written in Current/Last Load (0x0222)[8](RO): 0 MB
Total MB Read in Current/Last Load (0x0223)[8](RO): 4 MB
Logical Position of First Encrypted Block (0x0224)[8](RO): 0
Logical Position of First Unencrypted Block after First Encrypted Block (0x0225)[8](RO): 18446744073709551615
Medium Manufacturer (0x0400)[8](RO): HPE
Medium Serial Number (0x0401)[32](RO): U220601285
Medium Length (0x0402)[4](RO): 846 m
Medium Width (0x0403)[4](RO): 12.7 mm
Assigning Organization (0x0404)[8](RO): LTO-CVE
Medium Density Code (0x0405)[1](RO): 0x58
Medium Manufacture Date (0x0406)[8](RO): 20220601
MAM Capacity (0x0407)[8](RO): 8192 bytes
Medium Type (0x0408)[1](RO): 0x0
Medium Type Information (0x0409)[2](RO): 0x0
(0x1000)[28](RO):
0000 be 66 7a 33 54 38 57 4a 32 31 31 34 53 4f 4e 59 |.fz3T8WJ2114SONY|
0010 20 20 20 20 00 02 25 fb 00 10 00 00 | ..%..... |
(0x1001)[24](RO):
0000 be 66 7a 33 54 38 57 4a 32 31 31 34 55 32 32 30 |.fz3T8WJ2114U220|
0010 36 30 31 32 38 35 00 10 |601285.. |
Using hardware encryption is a particularly good idea if you're storing your media off-site. It can be enabled using the stenc utility. (I've created a port of it here in case you don't want to compile it yourself.)
Encryption will be off by default, as shown below.
# stenc
Status for /dev/sa0 (HP Ultrium 5-SCSI Z6ED)
--------------------------------------------------
Reading: Not decrypting
Writing: Not encrypting
Key instance counter: 0
Current block status: Unable to determine
Supported algorithms:
1 AES-256-GCM-128
Key descriptors allowed, maximum 32 bytes
Raw decryption mode allowed, raw read disabled by default
To enable encryption, create a 256-bit encryption key and load it into the drive's memory (it will be unloaded when the drive is powered off). Be sure to store the key somewhere safe!
# head /dev/random | sha256sum > /usr/local/etc/stenc-20240909.key
# chmod 600 /usr/local/etc/stenc-20240909.key
# stenc -e on -k /usr/local/etc/stenc-20240909.key
Decrypt mode not specified, using decrypt = on
Algorithm index not specified, using 1 (AES-256-GCM-128)
Changing encryption settings for device /dev/sa0...
Success! See system logs for a key change audit log.
Check the status again:
# stenc
Status for /dev/sa0 (HP Ultrium 5-SCSI Z6ED)
--------------------------------------------------
Reading: Decrypting (AES-256-GCM-128)
Unencrypted blocks not readable
Writing: Encrypting (AES-256-GCM-128)
Protecting from raw read
Key instance counter: 1
Current block status: Unable to determine
Supported algorithms:
1 AES-256-GCM-128
Key descriptors allowed, maximum 32 bytes
Raw decryption mode allowed, raw read disabled by default
You can also disable encryption with the following command. The stenc
output will reflect the result.
# stenc -e off
Decrypt mode not specified, using decrypt = off
Algorithm index not specified, using 1 (AES-256-GCM-128)
Changing encryption settings for device /dev/sa0...
Success! See system logs for a key change audit log.
# stenc
Status for /dev/sa0 (HP Ultrium 5-SCSI Z6ED)
--------------------------------------------------
Reading: Not decrypting
Writing: Not encrypting
Key instance counter: 2
Current block status: Encrypted, key missing or invalid (AES-256-GCM-128)
Protected from raw read
Supported algorithms:
1 AES-256-GCM-128
Key descriptors allowed, maximum 32 bytes
Raw decryption mode allowed, raw read disabled by default
Note: Remember to turn on encryption again before starting to write the archive onto tape.
The media I want to archive is stored in a ZFS dataset named backup/media
. We'll create a snapshot of it (named backup/media@2024-09-09
) to use as the source for the archive, and protect the snapshot from deletion (hold it in ZFS parlance) while the archival job is in progress.
# zfs snapshot backup/media@2024-09-09
# zfs hold keep backup/media@2024-09-09
For creating the actual archive, I'm using GNU tar (archivers/gtar) and the GNU format which, according to my research, seems to be the best one for my particular use case.
[I]f your main concern are long paths and/or large files, the GNU format is the best choice.
In addition, the GNU format also supports volume labels and multi-volume archives that span multiple tapes.
I'm aware of various open-source backup applications such as Amanda and Bacula (see this article from the FreeBSD Journal for a great overview of the various alternatives), but they seem overkill for my (relatively simple) single-drive setup/archival use case. I also prefer simple, widely-supported and well-documented tools and formats that will still be around in several decades' time should my data live for so long.
I also briefly considered using LTFS instead of tar, but came to the conclusion that it's better suited for transferring data than archiving it.
With that out of the way, on to the command itself!
# gtar \
--format=gnu \
--label="backup/media@2024-09-09" \
--multi-volume \
--one-file-system \
--blocking-factor=2048 \
--totals \
--totals=USR1 \
--index-file=/root/backup-media-2024-09-09-index.txt \
--exclude='.DS_Store' \
--exclude='._*' \
--sort=name \
--create \
--verbose \
--verbose \
--file $TAPE \
/backup/media/.zfs/snapshot/2024-09-09
The GNU format is the default for the version of GNU tar I'm using1, but I want to explicitly give it as an argument (--format
) in order to document which format the archive was created with.
The archive label (--label
) is set to match the name of the ZFS snapshot (backup/media@2024-09-09
).
Since the archive will span multiple tapes, we're specifying the --multi-volume
option.
I'm also telling tar to stay in the local filesystem while creating the archive (--one-file-system
).
The tar blocking factor (--blocking-factor
) is set to kern.maxphys / 512
2 for best performance.
The total number of bytes processed will be reported at the end of the run (--totals
). Futhermore, sending a USR1
signal to the tar process (pkill -SIGUSR1 tar
) while it's running will cause it to print the number of bytes it has processed up to that point (--totals=USR1
).
I also want to create an index file for later reference (--index-file
). The file will show which volume contains which file, and it can be used for restoring single files (see below).3
Various HFS+ (macOS) metadata files will be excluded (--exclude
) and the archive contents will be sorted by name (--sort=name
).
We're creating a new archive (--create
).
The two --verbose
switches will cause tar to write more details about the files being archived to the index file.
The destination (--file
) is our tape drive.
Finally, the snapshot we created (backup/media@2024-09-09
) can be accessed using the path /backup/media/.zfs/snapshot/2024-09-09
which we'll give as the final argument to gtar
.4
Writing (and reading) a single LTO-5 tape will take a little over 3 hours at full speed, and tar will prompt you for the next one. You can eject the cartridge using the physical eject button on the drive (if it has one), or by issuing the command mt offline
in another terminal.
Note: Remember to toggle the physical write protection tab on the cartridge after you've ejected it to prevent yourself from accidentally overwriting your precious data.
After you've finished creating the archive, you can release and destroy the snapshot and congratulate yourself for a job well done.
# zfs release keep backup/media@2024-09-09
# zfs destroy backup/media@2024-09-09
Finally, stow away the tapes somewhere safe, preferably off-site. Be sure to follow the cartridge manufacturer's recommendations for long-term storage5.
While LTO drives offer strong guarantees for ensuring that data is written to tape correctly (drives have built-in ECC error correction, a separate read head after the write head to verify the written data etc.), you might have messed up the tar arguments (like I first did!). You don't want to find out that your archive is borked when it comes time to restore it!
To verify your data, you can list the contents of the entire archive (tar --list
) starting from the first volume. Recall that tar archives have no embedded indices, so tar must read through the whole archive when listing its contents.
However, since testing the entire archive will take a long time (over 3 hours per tape), you might want to do spot checks instead by restoring individual file(s) from a given volume (they're independent archives that can be manipulated like any other archive when the --multi-volume
argument is specified).
To restore a single file (to /tmp
), issue the following command. See the index file for which volume the file is stored on, and insert the corresponding tape before executing the command. To restore the entire archive, leave out the filename and start with the first tape.
gtar \
--multi-volume \
--blocking-factor=2048 \
--extract \
--verbose \
--verbose \
--file=$TAPE \
--directory=/tmp \
backup/media/.zfs/snapshot/2024-09-09/path/of/file/to/restore
After creating the initial archive, my plan is to check its integrity every 6 months or so, and to create a new archive once a year (or if the integrity of the archive has been compromised). I'll also look into upgrading to a later LTO generation once prices have dropped down to an acceptable level.
Since I'm reading contiguous large files from mirrored spinning rust vdevs, the tape drive will quite happily chug along at its maximum speed (140 MB/s) without any shoe-shining. Depending on your setup (your disks may be slower than mine, or your data may be more fragmented), you might need to stage your data on a fast SSD or buffer it in memory first (using misc/mbuffer).
If your LTO cartridges have barcode labels6, you can use sysutils/sg3_utils to write the barcode (e.g. 018990L5
) onto the cartridge's Medium Auxiliary Memory (MAM) chip. See this article for more details.
# sg_write_attr $TAPE 0x806=018990L5
# sg_read_attr -f 0x806 $TAPE
Barcode: 018990L5
[1] You can check the default options with the --help
switch.
❯ gtar --help | tail -2
*This* tar defaults to:
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/etc/rmt
[2] The maximum block I/O access size was increased to 1MB in FreeBSD 13. This matches max_effective_iosize
reported by mt status -v
.
❯ sysctl -d kern.maxphys
kern.maxphys: Maximum block I/O access size
❯ sysctl kern.maxphys
kern.maxphys: 1048576
[3] The index can be stored on the last tape as the next file after the tar archive (with the on-tape format |tar|EOF|index|EOF|
, you can quickly skip to the index using EOF
markers), on a CD-R and/or as a paper printout that's stored with the tapes.
You should also consider storing a copy of all the tools used to create the the archive along with the index file, should they not be readily available when you'll want to restore the archive in the future.
[4] You could, naturally, use a live filesystem as the source if you're convinced it won't change mid-run, but using a read-only snapshot is much safer.
[5] See, for example, the HP LTO Ultrium tape drives technical reference manual Volume 1: hardware integration pp. 44–45.
[6] Use the Proxmox LTO label generator or tapelabel.de to generate labels if you need them.