A bridge between your data, and Infinite Storage |
Thankfully for us, the GNU Tar utility provides an easy way to pipe data in and out. I'm sure most of you remember the old way of using gunzip piped into tar to extract a .tgz file. That being said, these examples here are using the now common gnu tar utility that comes with most (if not all) Linux distributions. In fact, I was using GNU Tar 1.16 when I tested these commands.
Using tar, create a zipped backup of "/etc" and pipe the tared data through JS3tream storing the data into bucket "mybucket" prefix "etc.tgz".
tar -C / -czpO /etc | java -jar js3tream.jar -K mykey.txt -b mybucket:etc.tgz -i -t "An archive of my /etc directory"
> java -jar js3tream.jar -K mykey.txt -b mybucket:etc.tgz -o | tar -C / -xz
Using tar and it's incremental backup feature, we can keep an incremental backup of our data on S3. Lets say for example, we want to do an incremental backup every week of our /etc directory. First, we must create an initial archive. Then create an incremental archive for the next 3 weeks.
Here is the command to create the initial archive. note the -g option for tar. This will create a tar differential file named etc.tgz.dif. Tar will use this file from now on to create differential archives.
tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.0.tgz -i -t "The initial archive of /etc"
I suggest you make a copy of the etc.tgz.dif file. This way, we can start a new set of incremental archives without having to re-upload our initial archive.
cp etc.tgz.dif etc.tgz.dif.initial
This command will now use our etc.Tgz.dif to create an incremental tar archive. You'll note that the command is almost the exact same. The only difference is the name of the bucket.
tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.1.tgz -i -t "An incremental archive of /etc"
You can now repeat this command each week, changing the bucket name.
Perhaps you would like to simply keep a single incremental archive of your /etc directory. This is where we can re-use the copy of the original tar differential file. Create the initial archive as above, and keep a copy of the original etc.tgz.dif file. Then, each time you want to create an incremental archive, delete the old one from S3, and create a new one.
tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.0.tgz -i -t "The initial archive of /etc" cp etc.tgz.dif etc.tgz.dif.initial
Then each week
java -jar js3stream.jar -K mykey.txt -b mybucket:etc.1.tgz -d rm etc.tgz.dif cp etc.tgz.dif.initial etc.tgz.dif tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.1.tgz -i -t "An incremental archive of /etc"
The procedure to restore an incremental archive from S3 is similar to a standard archive. The only real difference is that we must restore the S3 archives in order. And we must restore all of them.
First, lets get a list of all of our buckets from S3
java -jar js3tream.jar -K mykey.txt -l 2006-12-22 11:32:11 - mybucket 2006-12-24 01:21:14 - mybucket2
Now, lets list the archives stored in the bucket we want to use
java -jar js3tream -K mykey.txt -l -b mybucket 2006-12-21 02:15:12 - mybucket:etc.0.tgz TAG: The initial archive of /etc 2006-12-21 04:16:57 - mybucket:etc.1.tgz -> An incremental archive of /etc 2006-12-21 07:12:12 - mybucket:etc.2.tgz -> An incremental archive of /etc 2006-12-21 07:25:21 - mybucket:etc.3.tgz -> An incremental archive of /etc
Now we know the names of our archives. We will need to perform 4 restores to be up to date. Note the difference in the tar -g option. This tells tar to perform a differential restore, but no dif file is provided on restore operations.
java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.0.tgz | tar -g /dev/null -C / -xz java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.1.tgz | tar -g /dev/null -C / -xz java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.2.tgz | tar -g /dev/null -C / -xz java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.3.tgz | tar -g /dev/null -C / -xz
The following command will tar up the /etc directory on a remote machine, pull the TAR data back over using SSH, and then send the stream out to S3
ssh remote_machine tar -czpO /etc | java -jar js3tream.jar -K key.txt -i -b mys3bucket:etc.0.tgz -t "Initial archive of /etc" -v
With the help of cron, and a very simple shell script, I've automated the backup of my data to s3. Cron is used to start the backup process, and the shell script does all the work.
Here is a link to the s3backup.sh script I'm using.
combined with the following lines in my crontab
# Backup the spool directory every 1st and 15th 00 01 01 * * cd /root/bin; ./s3backup.sh NORMAL /var/spool js3tream.MyArchiveBucket:/var/spool-A-FULL.tgz 00 01 15 * * cd /root/bin; ./s3backup.sh NORMAL /var/spool js3tream.MyArchiveBucket:/var/spool-B-FULL.tgz # Backup the /etc directory every 1st and 15th 03 01 01 * * cd /root/bin; ./s3backup.sh NORMAL /etc js3tream.MyArchiveBucket:/etc-A-FULL.tgz 03 01 15 * * cd /root/bin; ./s3backup.sh NORMAL /etc js3tream.MyArchiveBucket:/etc-B-FULL.tgz # Incrementally backup the www directory, starting with a full in January, and incremental every 1st and 15th 00 03 01 01 * cd /root/bin; ./sqldump.sh; ./s3backup.sh FULL /var/www js3tream.MyArchiveBucket:/var/www-A-01-FULL.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 01 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-02-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 02 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-03-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 02 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-04-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 03 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-05-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 03 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-06-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 04 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-07-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 04 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-08-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 05 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-09-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 05 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-10-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 06 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-11-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 06 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-12-DIFF.tgz /root/bin/diff/var-www.tgz.diff # Incrementally backup the www diretory under a new title. Almost the same as above. 00 03 01 07 * cd /root/bin; ./sqldump.sh; ./s3backup.sh FULL /var/www js3tream.MyArchiveBucket:/var/www-B-01-FULL.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 07 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-02-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 08 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-03-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 08 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-04-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 09 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-05-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 09 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-06-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 10 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-07-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 10 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-08-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 11 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-09-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 11 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-10-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 01 12 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-11-DIFF.tgz /root/bin/diff/var-www.tgz.diff 00 03 15 12 * cd /root/bin; ./sqldump.sh; ./s3backup.sh DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-12-DIFF.tgz /root/bin/diff/var-www.tgz.diff
The s3backup.sh script takes 3 or 4 parameters.
The command you see named "sqldump.sh" is a very very basic script that dumps my postgres and mysql databases each to a backup.sql file. These
get placed in my /var/www directory, and thus backed up with it.
s3backup.sh "type" "directory" "bucket:prefix" "diff-file" type = The type of backup. NORMAL, FULL or DIFF FULL - Does a full backup of the directory, and creates a TAR incremental DIFF file for the following differential backups. DIFF - Does an incremental backup of the directory, using the provided diff file. NORMAL does a complete non-incremental backup of the directory to S3. directory = The directory to archive. bucket:prefix = the bucket name and the prefix passed to js3tream. See the -b option in js3tream for details. diff-file = the tar differential file used. This is the file that TAR needs to perform incremental backups.