A bridge between your data, and Infinite Storage

Backup using tar and JS3tream in Linux

Thankfully for us, the GNU Tar utility provides an easy way to pipe data in and out. I'm sure most of you remember the old way of using gunzip piped into tar to extract a .tgz file. That being said, these examples here are using the now common gnu tar utility that comes with most (if not all) Linux distributions. In fact, I was using GNU Tar 1.16 when I tested these commands.

Common TAR switches

-c will create a new tar archive.
-z will compress the archive using gzip.
-p will preserve the file permissions.
-O will stream the tar output through STDOUT.
-C tell tar to create or extract it's archive in this directory
-g tell tar to generate an incremental archive

Common JS3tream switches

-K is the name of your S3 key file.
-b is the name of the S3 bucket to stream the data into
-i tells js3tream to read from STDIN, and send the data to S3
-t tells js3tream to tag the stream with a message
-d delete an S3 archive
-l list all of my S3 archives

Backup a Directory to S3

Using tar, create a zipped backup of "/etc" and pipe the tared data through JS3tream storing the data into bucket "mybucket" prefix "etc.tgz".

tar -C / -czpO /etc | java -jar js3tream.jar -K mykey.txt -b mybucket:etc.tgz -i -t "An archive of my /etc directory"

Restore an Archive from S3

Using JS3tream pipe the archive through tar extracting it back to the filesystem.

>
java -jar js3tream.jar -K mykey.txt -b mybucket:etc.tgz -o | tar -C / -xz

Perform an Incremental Backup to S3

Using tar and it's incremental backup feature, we can keep an incremental backup of our data on S3. Lets say for example, we want to do an incremental backup every week of our /etc directory. First, we must create an initial archive. Then create an incremental archive for the next 3 weeks.

Here is the command to create the initial archive. note the -g option for tar. This will create a tar differential file named etc.tgz.dif. Tar will use this file from now on to create differential archives.

tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.0.tgz -i -t "The initial archive of /etc"

I suggest you make a copy of the etc.tgz.dif file. This way, we can start a new set of incremental archives without having to re-upload our initial archive.

cp etc.tgz.dif etc.tgz.dif.initial

This command will now use our etc.Tgz.dif to create an incremental tar archive. You'll note that the command is almost the exact same. The only difference is the name of the bucket.

tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.1.tgz -i -t "An incremental archive of /etc"

You can now repeat this command each week, changing the bucket name.

Perhaps you would like to simply keep a single incremental archive of your /etc directory. This is where we can re-use the copy of the original tar differential file. Create the initial archive as above, and keep a copy of the original etc.tgz.dif file. Then, each time you want to create an incremental archive, delete the old one from S3, and create a new one.

tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.0.tgz -i -t "The initial archive of /etc"
cp etc.tgz.dif etc.tgz.dif.initial

Then each week

java -jar js3stream.jar -K mykey.txt -b mybucket:etc.1.tgz -d
rm etc.tgz.dif
cp etc.tgz.dif.initial etc.tgz.dif
tar -g etc.tgz.dif -C / -czpO /etc | java -jar -js3tream.jar -K mykey.txt -b mybucket:etc.1.tgz -i -t "An incremental archive of /etc"

Restore an Incremental Archive from S3

The procedure to restore an incremental archive from S3 is similar to a standard archive. The only real difference is that we must restore the S3 archives in order. And we must restore all of them.

First, lets get a list of all of our buckets from S3

java -jar js3tream.jar -K mykey.txt -l
2006-12-22 11:32:11 - mybucket
2006-12-24 01:21:14 - mybucket2

Now, lets list the archives stored in the bucket we want to use

java -jar js3tream -K mykey.txt -l -b mybucket
2006-12-21 02:15:12 - mybucket:etc.0.tgz
    TAG: The initial archive of /etc
2006-12-21 04:16:57 - mybucket:etc.1.tgz
   -> An incremental archive of /etc
2006-12-21 07:12:12 - mybucket:etc.2.tgz
   -> An incremental archive of /etc
2006-12-21 07:25:21 - mybucket:etc.3.tgz
   -> An incremental archive of /etc

Now we know the names of our archives. We will need to perform 4 restores to be up to date. Note the difference in the tar -g option. This tells tar to perform a differential restore, but no dif file is provided on restore operations.

  java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.0.tgz | tar -g /dev/null -C / -xz
  java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.1.tgz | tar -g /dev/null -C / -xz
  java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.2.tgz | tar -g /dev/null -C / -xz
  java -jar js3tream.jar -K mykey.txt -o -b mybucket:etc.3.tgz | tar -g /dev/null -C / -xz

Backup a remote file system using SSH, TAR and JS3tream

The following command will tar up the /etc directory on a remote machine, pull the TAR data back over using SSH, and then send the stream out to S3

  ssh remote_machine tar -czpO /etc | java -jar js3tream.jar -K key.txt -i -b mys3bucket:etc.0.tgz -t "Initial archive of /etc" -v

Automate the backup of your files

With the help of cron, and a very simple shell script, I've automated the backup of my data to s3. Cron is used to start the backup process, and the shell script does all the work.

Here is a link to the s3backup.sh script I'm using.

combined with the following lines in my crontab

# Backup the spool directory every 1st and 15th
00 01 01 * * cd /root/bin; ./s3backup.sh  NORMAL /var/spool js3tream.MyArchiveBucket:/var/spool-A-FULL.tgz
00 01 15 * * cd /root/bin; ./s3backup.sh  NORMAL /var/spool js3tream.MyArchiveBucket:/var/spool-B-FULL.tgz

# Backup the /etc directory every 1st and 15th
03 01 01 * * cd /root/bin; ./s3backup.sh  NORMAL /etc js3tream.MyArchiveBucket:/etc-A-FULL.tgz
03 01 15 * * cd /root/bin; ./s3backup.sh  NORMAL /etc js3tream.MyArchiveBucket:/etc-B-FULL.tgz

# Incrementally backup the www directory, starting with a full in January, and incremental every 1st and 15th
00 03 01 01 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  FULL /var/www js3tream.MyArchiveBucket:/var/www-A-01-FULL.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 01 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-02-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 02 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-03-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 02 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-04-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 03 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-05-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 03 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-06-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 04 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-07-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 04 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-08-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 05 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-09-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 05 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-10-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 06 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-11-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 06 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-A-12-DIFF.tgz /root/bin/diff/var-www.tgz.diff

# Incrementally backup the www diretory under a new title.  Almost the same as above.
00 03 01 07 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  FULL /var/www js3tream.MyArchiveBucket:/var/www-B-01-FULL.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 07 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-02-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 08 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-03-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 08 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-04-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 09 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-05-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 09 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-06-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 10 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-07-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 10 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-08-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 11 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-09-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 11 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-10-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 01 12 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-11-DIFF.tgz /root/bin/diff/var-www.tgz.diff
00 03 15 12 * cd /root/bin; ./sqldump.sh; ./s3backup.sh  DIFF /var/www js3tream.MyArchiveBucket:/var/www-B-12-DIFF.tgz /root/bin/diff/var-www.tgz.diff

The s3backup.sh script takes 3 or 4 parameters.
The command you see named "sqldump.sh" is a very very basic script that dumps my postgres and mysql databases each to a backup.sql file. These get placed in my /var/www directory, and thus backed up with it.

  s3backup.sh "type" "directory" "bucket:prefix" "diff-file"
  
  type = The type of backup.  NORMAL, FULL or DIFF
            FULL - Does a full backup of the directory, and creates a TAR incremental DIFF file for the following differential backups.
            DIFF - Does an incremental backup of the directory, using the provided diff file.
            NORMAL does a complete non-incremental backup of the directory to S3.

  directory = The directory to archive.

  bucket:prefix = the bucket name and the prefix passed to js3tream.  See the -b option in js3tream for details.

  diff-file = the tar differential file used.  This is the file that TAR needs to perform incremental backups.