never lose your data again: backup remotely using rsync ssh and rdiff-backup
if you've ever lost precious data after a hard drive failure, you've probably learned your lesson and are now automatically backing up your system.
your treasured pictures, videos and documents may still be at risk. your computer could be stolen, destroyed by flood or fire or chopped into small pieces by a jealous ex-lover.
using a remote backup service is a good way to mitigate against this type of problem. for around $10 a month, you can find companies willing to store 10Gb of data for you. your data is usually accessible using a variety of methods, including rsync, vpn and ftp. to see some of these services, type remote backup rsync service into google.
in this article, i discuss using open source software to take advantage of these services in an efficient and secure manner, allowing the backup of large directories over a dsl-speed line while you sleep.
a word on security
moving your files onto a remote location presents an inherent security risk, since you can't be 100% sure of the security of that location. we'll make sure our files are encrypted in transit and at the remote storage location.the storage company that i'm using emailed me the account password that i'd setup, in clear text. this didn't exactly give me a warm-fuzzy-feeling about their security procedures.
tools
there are many tools that you can use. i've chosen the following:- rdiff-backup is a great tool that combines best features of a mirror and an incremental backup, allowing you to create a simple mirror of your files, in a bandwidth and disk space efficient manner, while retaining the option of recovering a file that you've deleted or changed.
- rsync is a bandwidth efficient tool allowing fast, incremental file transfers, to optionally remote locations and over secure transports.
- gpgdir is a utility that uses CPAN GnuPG::Interface to recursively encrypt a directory structure, it has lots of useful options to increase your security.
how to set things up
please note that my examples are for debian etch, but should work well on other *nix platforms with minor modification. enough preamble, let's get down to business.i'll assume that you've already backed up your files (e.g. using rdiff-backup) to a "backup area".
if you haven't, don't worry, rdiff-backup is easy to use and similar to rsync. for example, to verbosely backup /home files on localmachine1.example.com to the local machine, excluding the mozilla cache, you might do:
$ /usr/bin/rdiff-backup -v5 --exclude '**/.mozilla/**/Cache' \
user@localmachine1.example.com::/home/ /backup/localmachine1/homeget rsync and rdiff-backup
typically,rsync and rdiff-backup are available as installable packages, which you can get e.g. by:
# apt-get install rsync rdiff-backupdownload and build gpgdir
downloadgpgdir. grab the latest version from http://www.cipherdyne.com/gpgdir e.g. by:
$ cd ~
$ wget http://www.cipherdyne.com/gpgdir/download/gpgdir-1.5.tar.gz
$ tar xvfz gpgdir-1.5.tar.gz
$ cd gpgdir-1.5/# ./install.pl# apt-get install gcc cpp binutils libc6-devcreate your encryption keys
let's create our encryption keys for gpg. in this example i'm generating a password-less key to keep things simple for scripting, this is typically a bad idea security-wise. use gpg-agent if you setup a password.$ gpg --gen-keynote, this can take a loooooong while (like minutes) especially if you system is idle. be patient or to speed things up, give your computer a workout (weirdly enough ... gotta love the entropy).
take a look at your new keys:
$ gpg --fingerprint
/home/john/.gnupg/pubring.gpg
------------------------
pub 1024D/363220CB 2007-12-05
Key fingerprint = 545D 238A 4A8C 7381 BE24 F165 31D5 4583 3632 20CB
uid John Quinn
sub 2048g/3AE52258 2007-12-05the output also shows the secondary public key ID for the 2048-bit ElGamel (2048g) key used for encrypting. It's key id is 3AE52258
you are now all set to encrypt a target directory with e.g.:
$ gpgdir --no-password --Key-id 3AE52258 --verbose --encrypt [directory]or decrypt it with:
$ gpgdir --no-password --Key-id 3AE52258 --verbose --decrypt directorygpgdir details on the gpgdir man page.
prepare your remote server for password-less login
to allow you to securely transfer your files overssh using a script, you'll want to setup password-less ssh access from the computer with your backup area, to the remote backup system. you can do this (on your local system) by:
create your key:
$ ssh-keygen -t rsacopy it to the remote machine:
$ ssh-copy-id -i ~/.ssh/id_dsa user@remotemachine.example.com keychain or ssh-agent.
scripting your remote backup
now, let's create a simple script that will:- copy your backup area to a "staging" area
- recursively encrypt it
- transfer it to the remote destination, using no more than 30KBPS of your upload bandwidth.
# bail if anything fails
set -e
backupDirs="/var/backupArea /var/someOtherArea"
rsyncExe=/usr/bin/rsync
gpgdirExe=/usr/bin/gpgdir
gpgdirFlags="--no-password --Key-id 3AE52258 --encrypt"
remoteBackupArea=user@remotemachine.example.com:/home/user
# setup remote backup to go over ssh, compressed and not using more than 30KBPS of bandwidth
remoteRsyncFlags="--bwlimit=30 --archive --compress -e ssh"
localRsyncFlags="--archive"
localBackupStage=/backup/globalBackupRemoteStage
# remove any old staging area
echo "cleaning up old backup area"
rm -rf ${localBackupStage}
echo "creating staging area: ${localBackupStage}"
mkdir --parents ${localBackupStage}
echo "staging local directories: ${backupDirs}"
${rsyncExe} ${localRsyncFlags} ${backupDirs} ${localBackupStage}
echo "encrypting the local staging area: ${localBackupStage}"
${gpgdirExe} ${gpgdirFlags} ${localBackupStage}
echo "transfering encrypted staging area to the remote location: ${remoteBackupArea}"
${rsyncExe} ${remoteRsyncFlags} ${localBackupStage} ${remoteBackupArea}finally
setup up that 'lil baby incron, sit back, relax, and wait for a natural disaster.
further work
- rsyncrypto looks like an interesting tool that might help with this problem area.
- it would be nice to improve the backup script so that it doesn't blow away the staging area on each run, but instead does an incremental copy. this isn't completely trivial.
- the script doesn't use the
rsync --deleteoption. you may want to consider this. the--deleteoption removes files from the destination that have been removed from the source.
tech blog
- john's blog
- 1600 reads



delicious
digg
reddit
google
yahoo