Last active
March 24, 2024 13:26
-
-
Save Spoygg/f6cdfbe6627a41fcf75fa7320b9dee3d to your computer and use it in GitHub Desktop.
Use rsync to sync to directories but keep history of what is synced to make the process more optimal for huge number of files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# | |
# Sync two directories with rsync, but keep history to optimize the process. | |
# On subsequent runs it will only sync files added since last sync. | |
# This allows an easy continue in case script is interupted. | |
# This also helps with network connectivity to remotes since each file is | |
# transfered by initiating new rsync command. This solves one issue I have | |
# faced when syncing large number of files and that is connection breaking | |
# if the process takes to long (in my case it was about 10hrs to transfer all | |
# the files). | |
# | |
# Accepts two arguments source and destination directory. | |
# Make sure source and destination do not end with a slash. | |
# Script assumes that both source and destination directory already exist. It | |
# is meant only to sync source content to destination content. | |
# | |
# When started it will output where it saves history. Do not delete that file! | |
# Delete history file if you want to re-sync from clean state. | |
# | |
# Example usage | |
# In parent directory of Audiobooks run: | |
# ./sync_with_history.sh Audiobooks user@xhostname:/media/ServerMedia/Audiobooks | |
# to sync files to a remote server. You'll need ssh access set up. | |
# | |
# To sync local directories simply run: | |
# ./sync_with_history.sh Audiobooks /path/to/destination/Audiobooks | |
# in the parent directory of Audiobooks. | |
# Create history file name | |
escaped1=$(echo $1 | tr / -) | |
escaped2=$(echo $2 | tr / -) | |
sync_with_history_done_list="sync_with_history_done_list-$escaped1-to-$escaped2" | |
echo "Saving history to $sync_with_history_done_list" | |
# Ensure sync_with_history_done_list exists | |
touch $sync_with_history_done_list | |
# List all not rsync-ed files to a list | |
find $1 -mindepth 1 -type f -printf '%P\n' | grep -vFf $sync_with_history_done_list > sync_with_history_todo_list | |
cat sync_with_history_todo_list | while read line | |
do | |
echo "Sending: $line" | |
echo "$line" > files-to-include | |
# NOTE: use rsync -a if you want to keep permissions, owner, group etc. | |
# I use -r because I don't need those. | |
rsync -r --files-from=files-to-include $1/ $2/ | |
echo "$line" >> $sync_with_history_done_list | |
done | |
# Clean up. leave only sync_with_history_done_list | |
touch files-to-include | |
rm files-to-include | |
rm sync_with_history_todo_list |
Thanks for this!
I'm doing a similar thing and used this for inspiration to write this fish shell function:
function stickysync_backup --argument historyfile from to
set new_files (mktemp)
combine (ssh backup "cd $from && fd -S+1b --changed-before '24 hours'" | psub) not $historyfile >$new_files
cat $new_files | while read line
rsync -r --files-from=(echo $line | psub) backup:$from $to
and echo $line >>$historyfile
or echo failed $line
end
end
stickysync_backup ~/d/00_Metadata/stickysync_audiobooks d/_audiobooks/ /mnt/d/82_Audiobooks/
🐁
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I was setting up self hosted audiobook server on an old laptop and I was trying to sync my audiobooks library from my main computer (about 100GBs and 4.5k files).
My main goal was to sync my main computer Audiobooks directory, where I keep all my audiobooks from Audible and other sources. I fetch new books from Audible via Libation and I don't want to run that on my modest audiobook server. On my audiobook server I also have Audiobooks directory which is used by Audiobookshelf.
With this in mind workflow is to sync new books I bought on Audible via Libation to main computer, then run some file sync to sync it to the server.
I started with scp, which proved to be unreliable for my use case. Then I switched to rsync, which wasn't that much better since it wouldn't transfer everything, but it would query everything on the server to check if it already exists. That also took a long time.
I've researched some more sync utilities like unison and self hosted solutions like Syncthing. Either they didn't do what I wanted, either they were too heavy for my simple use case. I wanted just a command line utility that will keep track of what is already synced and not do it again. My server is very low on resources and I don't want to waste any of them to useless syncing.
So, since I haven't managed to find anything, this script is born. Use at your own risk!