Skip to content

Instantly share code, notes, and snippets.

@plembo
Last active December 20, 2024 03:16
Show Gist options
  • Save plembo/e680ca02050918b7f2a192ca51b2523b to your computer and use it in GitHub Desktop.
Save plembo/e680ca02050918b7f2a192ca51b2523b to your computer and use it in GitHub Desktop.
S3 Backup

Backup with AWS S3

Amazon Web Services' Simple Storage Service, a/k/a "S3", is a flexible object storage facility that is widely used by government, enterprises, and even small business for serving and backing up files. Creating a storage "bucket" is easy enough, but most consumers find the permissioning system to be indecipherable. That wasn't as much of a problem for me, as I've been through a few rounds of AWS training, including for S3. But it also wasn't straightforward. I'm creating this gist mostly to avoid having to puzzle things out for my next S3 project.

Overview

The goal is to backup files on my home server to an S3 bucket. In the example that follows "example.com" is my home domain.

Third party software

aws-cli is the official command-line tool for AWS.

rclone is "rsync for cloud storage", a reliable and efficient tool for synchronizing storage nodes. Instructions for using with S3 are found here.

Set up an S3 bucket

Name: backup.example.com

Create an AWS IAM user and group

User: backup

Group: backup

Establish an access policy

Name: AWSS3BackupExample

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:GetBucketLocation"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": [
                "arn:aws:s3:::backup.example.com"
            ]
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::backup.example.com/*"
        }
    ]
}

Excludes files

/usr/local/etc/excludes.conf

A common "excludes" file should be used to have rsync and rclone ignore certain files or directories that do not need to be backed up. It should be created on each client to be backed up as well as the backup server.

; Exclude file for rsync job (mostly targetting home directories)
tmp/
Downloads/
.cache/
.dbus/
.gvfs/
.mozilla/
.config/google-chrome/
*/Cache/
.local/share/Trash/
tor-browser_en-US/
lost+found/
.npm/
venv/
.atom/
__pycache__/
.pylint/
.minecraft/
.config/Code/
.local/share/Steam
.vscode/
Projects/
.ssh/
.gnupg/
.config/rclone
.aws
.profile
.config/gcloud
.azure
.Azure
Drive/
.gradle/
.android/
.arduino15/
.config/cumulonimbus/

Linux clients

/usr/local/bin/rsync-host.sh

This script is run as root on each client to back up files to the backup server. Create a cron job for root to run it every day before that backup server is synched to the cloud.

#!/usr/bin/env bash
# Mirror host to backup server
SHOST=$(hostname|cut -f1 -d.)
THOST="server1"
TROOT="/data1/backup"
EXCL="/usr/local/etc/excludes.conf"
SDIRS=('/etc' '/root' '/home' '/usr/local/bin' '/usr/local/etc' '/var/spool/cron/crontabs' '/data1/docker')
LOGFILE="/data1/logs/backup/mirror_host.log"
TIMESTAMP=`date +%Y%m%d%H%M%S`

echo "${TIMESTAMP} Mirror ${SHOST} to ${THOST}" >${LOGFILE}

for SDIR in ${SDIRS[@]};
do
    rsync -avzR --exclude-from=${EXCL} --delete --log-file=${LOGFILE} ${SDIR} ${THOST}:${TROOT}/${SHOST}/ >>$LOGFILE

done

TIMESTAMP=`date +%Y%m%d%H%M%S`
echo "${TIMESTAMP} Mirroring completed" >>${LOGFILE}

Windows clients

Backing up Windows clients to the home server is done using rclone instead of rsync. Schedule it to run as the user every day before the backup server synch to the S3.

Due to peculiarities in Microsoft current implementation of SSL and SSH you'll need to configure rclone with its least secure settings, using a password and allowing for older ciphers. Its still secure, but a step down from what is achievable on Linux clients. Key changes from what I do on Linux clients are: (1) Set the user's SSH password; (2) leave options for a key_pem, key_file, key_file_pass, pubkey_file, and key_use_agent empty; (3) Set use_insecure_cipher_ to "true"; (4) Set option to disable_hashcheck to "true". [1]

exclude-myself.conf

Windows and Linux desktops are fundamentally different in many ways, as are those things you'd want to exclude from sync. Be sure this list is in Windows, not UNIX, format (I recommend opening and then saving with Notepad to be sure).

; Exclude file for rsync job (mostly targetting home directories)
tmp/
Downloads/
.cache/
*/Cache/
OneDrive
GoogleDrive/
.npm/
venv/
__pycache__/
.pylint/
Projects/
node_modules/

include-myself.txt

This is a list of folders to be synched. Again, be sure it is in Windows, not UNIX, format.

.ssh
.config
Desktop
Documents
Pictures
Music
scripts

sync-myself.bat

This is the script, that uses the old Windows shell language. Again be sure it is in Windows, not UNIX, format.

REM Script to mirror selected folders to local backup
echo off
setlocal enabledelayedexpansion
set USER=myself
set LOCALPATH=c:/Users/%USER%
set REMOTE=server1
set REMOTEPATH=/d1/backup/desktop2/C/Users/%USER%
set INCLUDES=%LOCALPATH%/scripts/includes-myself.txt
set EXCLUDES=%LOCALPATH%/scripts/excludes-myself.conf
set LOGFILE=%LOCALPATH%/scripts/mirror-user.log

cd %LOCALPATH%

echo %date% %time% Start Backup

for /F %%i in (%INCLUDES%) do (
  set incl=%%i
  echo !incl!
  rclone sync --skip-links -v --exclude-from !EXCLUDES! --log-file=!LOGFILE! !LOCALPATH!/!incl! !REMOTE!:!REMOTEPATH!/!incl!
)

echo %date% %time% End backup

Backup Server

/usr/local/bin/rclonetos3.sh

This script is run on the home server by root to synchronize its backup directory with an S3 bucket.

#!/usr/bin/env bash
# Mirror backup to cloud storage, using rclone
BUCKET="backup.example.com"
RMTNAME="s3:"
SHOST=$(hostname|cut -f1 -d.)
SVOL="/data1/backup"
SDIRS=('server1' 'desktop1' 'desktop2' 'desktop3' 'router')
MVOL="/data1/media"
MDIRS=('docs' 'video' 'audio')
LOGFILE="/data1/logs/backup/mirror_to_s3.log"
TIMESTAMP=`date +%Y%m%d%H%M%S`
EXCLUDES="/usr/local/etc/exclude.conf"

echo "${TIMESTAMP} Mirror backups from ${SHOST} to ${BUCKET}" >${LOGFILE}

for SDIR in ${SDIRS[@]};
do
   echo "System ${SDIR}" >>${LOGFILE}
    rclone -P --skip-links --exclude-from=${EXCLUDES} --log-file=${LOGFILE} -v --checksum --fast-list --s3-no-head sync ${SVOL}/${SDIR} ${RMTNAME}${BUCKET}/${SDIR}

done

TIMESTAMP=`date +%Y%m%d%H%M%S`
echo "${TIMESTAMP} Mirror media from ${SHOST} to ${BUCKET}" >>${LOGFILE}

for MDIR in ${MDIRS[@]};
do
    echo "Media directory ${MDIR}" >>${LOGFILE}
    rclone -P --skip-links --exclude-from=${EXCLUDES} --log-file=${LOGFILE} -v --checksum --fast-list --s3-no-head sync ${MVOL}/${MDIR} ${RMTNAME}${BUCKET}/${MDIR}
done

TIMESTAMP=`date +%Y%m%d%H%M%S`
echo "${TIMESTAMP} Mirroring completed" >>${LOGFILE}

NOTES

[1] Here is a transcript showing how I set up my Windows clients:

C:\Users\myuser> rclone config
Name                Type
====                ====

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n

name> example

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
 1 / 1Fichier
   \ (fichier)
 ...
   \ (smb)
39 / SSH/SFTP
   \ (sftp)
Storage> sftp   
 
Option host.
SSH host to connect to.
E.g. "example.com".
Enter a value.
host> backup.example.com

Option user.
SSH username.
Enter a string value. Press Enter for the default (MYCOMPUTER\myuser).
user> myuser

Option port.
SSH port number.
Enter a signed integer. Press Enter for the default (22).
port>

Option pass.
SSH password, leave blank to use ssh-agent.
Choose an alternative below. Press Enter for the default (n).
y) Yes, type in my own password
g) Generate random password
n) No, leave this optional password blank (default)
y/g/n> y
Enter the password:
password:
Confirm the password: xxxxxxxxxxx
password: xxxxxxxxxx

Option key_pem.
Raw PEM-encoded private key.
If specified, will override key_file parameter.
Enter a value. Press Enter to leave empty.
key_pem>

Option key_file.
Path to PEM-encoded private key file.
Leave blank or set key-use-agent to use ssh-agent.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a value. Press Enter to leave empty.
key_file>

Option key_file_pass.
The passphrase to decrypt the PEM-encoded private key file.
Only PEM encrypted key files (old OpenSSH format) are supported. Encrypted keys
in the new OpenSSH format can't be used.
Choose an alternative below. Press Enter for the default (n).
y) Yes, type in my own password
g) Generate random password
n) No, leave this optional password blank (default)
y/g/n>

Option pubkey_file.
Optional path to public key file.
Set this if you have a signed certificate you want to use for authentication.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a value. Press Enter to leave empty.
pubkey_file>

Option key_use_agent.
When set forces the usage of the ssh-agent.
When key-file is also set, the ".pub" file of the specified key-file is read and only the associated key is
requested from the ssh-agent. This allows to avoid `Too many authentication failures for *username*` errors
when the ssh-agent contains many keys.
Enter a boolean value (true or false). Press Enter for the default (false).
key_use_agent>

Option use_insecure_cipher.
Enable the use of insecure ciphers and key exchange methods.
This enables the use of the following insecure ciphers and key exchange methods:
- aes128-cbc
- aes192-cbc
- aes256-cbc
- 3des-cbc
- diffie-hellman-group-exchange-sha256
- diffie-hellman-group-exchange-sha1
Those algorithms are insecure and may allow plaintext data to be recovered by an attacker.
This must be false if you use either ciphers or key_exchange advanced options.
Choose a number from below, or type in your own boolean value (true or false).
Press Enter for the default (false).
 1 / Use default Cipher list.
   \ (false)
 2 / Enables the use of the aes128-cbc cipher and diffie-hellman-group-exchange-sha256, diffie-hellman-group-exchange-sha1 key exchange.
   \ (true)
use_insecure_cipher> 2

Option disable_hashcheck.
Disable the execution of SSH commands to determine if remote file hashing is available.
Leave blank or set to false to enable hashing (recommended), set to true to disable hashing.
Enter a boolean value (true or false). Press Enter for the default (false).
disable_hashcheck> true

Edit advanced config?
y) Yes
n) No (default)
y/n> n

Configuration complete.
Options:
- type: sftp
- host: backup.example.com
- user: myuser
- pass: *** ENCRYPTED ***
- use_insecure_cipher: true
- disable_hashcheck: true
Keep this "example" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

Current remotes:

Name                 Type
====                 ====
example              sftp

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

C:\Users\myuser>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment