never lose your data again: backup remotely using rsync ssh and rdiff-backup

if you've ever lost precious data after a hard drive failure, you've probably learned your lesson and are now automatically backing up your system.

your treasured pictures, videos and documents may still be at risk. your computer could be stolen, destroyed by flood or fire or chopped into small pieces by a jealous ex-lover.

using a remote backup service is a good way to mitigate against this type of problem. for around $10 a month, you can find companies willing to store 10Gb of data for you. your data is usually accessible using a variety of methods, including rsync, vpn and ftp. to see some of these services, type remote backup rsync service into google.

in this article, i discuss using open source software to take advantage of these services in an efficient and secure manner, allowing the backup of large directories over a dsl-speed line while you sleep.

a word on security

moving your files onto a remote location presents an inherent security risk, since you can't be 100% sure of the security of that location. we'll make sure our files are encrypted in transit and at the remote storage location.

the storage company that i'm using emailed me the account password that i'd setup, in clear text. this didn't exactly give me a warm-fuzzy-feeling about their security procedures.

tools

there are many tools that you can use. i've chosen the following:
  • rdiff-backup is a great tool that combines best features of a mirror and an incremental backup, allowing you to create a simple mirror of your files, in a bandwidth and disk space efficient manner, while retaining the option of recovering a file that you've deleted or changed.
  • rsync is a bandwidth efficient tool allowing fast, incremental file transfers, to optionally remote locations and over secure transports.
  • gpgdir is a utility that uses CPAN GnuPG::Interface to recursively encrypt a directory structure, it has lots of useful options to increase your security.

how to set things up

please note that my examples are for debian etch, but should work well on other *nix platforms with minor modification. enough preamble, let's get down to business.

i'll assume that you've already backed up your files (e.g. using rdiff-backup) to a "backup area".

if you haven't, don't worry, rdiff-backup is easy to use and similar to rsync. for example, to verbosely backup /home files on localmachine1.example.com to the local machine, excluding the mozilla cache, you might do:

$ /usr/bin/rdiff-backup -v5 --exclude '**/.mozilla/**/Cache' \
user@localmachine1.example.com::/home/ /backup/localmachine1/home
you can see more details on the rdiff-backup man page or the usage examples.

get rsync and rdiff-backup

typically, rsync and rdiff-backup are available as installable packages, which you can get e.g. by:
# apt-get install rsync rdiff-backup

download and build gpgdir

download gpgdir. grab the latest version from http://www.cipherdyne.com/gpgdir e.g. by:
$ cd ~
$ wget http://www.cipherdyne.com/gpgdir/download/gpgdir-1.5.tar.gz
$ tar xvfz gpgdir-1.5.tar.gz
$ cd gpgdir-1.5/
then build and install it, as root:
# ./install.pl
note, your system should be setup to build C code, e.g. by:
# apt-get install gcc cpp binutils libc6-dev

create your encryption keys

let's create our encryption keys for gpg. in this example i'm generating a password-less key to keep things simple for scripting, this is typically a bad idea security-wise. use gpg-agent if you setup a password.
$ gpg --gen-key

note, this can take a loooooong while (like minutes) especially if you system is idle. be patient or to speed things up, give your computer a workout (weirdly enough ... gotta love the entropy).

take a look at your new keys:

$ gpg --fingerprint
/home/john/.gnupg/pubring.gpg
------------------------
pub   1024D/363220CB 2007-12-05
      Key fingerprint = 545D 238A 4A8C 7381 BE24  F165 31D5 4583 3632 20CB
uid                  John Quinn
sub   2048g/3AE52258 2007-12-05

the output also shows the secondary public key ID for the 2048-bit ElGamel (2048g) key used for encrypting. It's key id is 3AE52258

you are now all set to encrypt a target directory with e.g.:

$ gpgdir --no-password --Key-id 3AE52258 --verbose --encrypt [directory]

or decrypt it with:

$ gpgdir --no-password --Key-id 3AE52258 --verbose --decrypt directory
see gpgdir details on the gpgdir man page.

prepare your remote server for password-less login

to allow you to securely transfer your files over ssh using a script, you'll want to setup password-less ssh access from the computer with your backup area, to the remote backup system. you can do this (on your local system) by:

create your key:

$ ssh-keygen -t rsa

copy it to the remote machine:

$ ssh-copy-id -i ~/.ssh/id_dsa user@remotemachine.example.com
note, again, for easy scripting, i'm creating a password-less key (bad john), to be more secure look into keychain or ssh-agent.

scripting your remote backup

now, let's create a simple script that will:
  1. copy your backup area to a "staging" area
  2. recursively encrypt it
  3. transfer it to the remote destination, using no more than 30KBPS of your upload bandwidth.
this script might look something like:
# bail if anything fails
set -e

backupDirs="/var/backupArea /var/someOtherArea"
rsyncExe=/usr/bin/rsync
gpgdirExe=/usr/bin/gpgdir
gpgdirFlags="--no-password --Key-id 3AE52258 --encrypt"
remoteBackupArea=user@remotemachine.example.com:/home/user

# setup remote backup to go over ssh, compressed and not using more than 30KBPS of bandwidth
remoteRsyncFlags="--bwlimit=30 --archive --compress -e ssh"
localRsyncFlags="--archive"
localBackupStage=/backup/globalBackupRemoteStage

# remove any old staging area
echo "cleaning up old backup area"
rm -rf ${localBackupStage}

echo "creating staging area: ${localBackupStage}"
mkdir --parents ${localBackupStage}

echo "staging local directories: ${backupDirs}"
${rsyncExe} ${localRsyncFlags} ${backupDirs} ${localBackupStage}

echo "encrypting the local staging area: ${localBackupStage}"
${gpgdirExe} ${gpgdirFlags} ${localBackupStage}

echo "transfering encrypted staging area to the remote location: ${remoteBackupArea}"
${rsyncExe} ${remoteRsyncFlags} ${localBackupStage} ${remoteBackupArea}

finally

setup up that 'lil baby in cron, sit back, relax, and wait for a natural disaster.

further work

  • rsyncrypto looks like an interesting tool that might help with this problem area.
  • it would be nice to improve the backup script so that it doesn't blow away the staging area on each run, but instead does an incremental copy. this isn't completely trivial.
  • the script doesn't use the rsync --delete option. you may want to consider this. the --delete option removes files from the destination that have been removed from the source.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Please note, this entry has been closed to new comments.