A bridge between your data, and Infinite Storage	Written by Shane Powell

Easily backup files to the Amazon S3 Web Service Storage using Windows, Linux or OSX

News
Introduction
The Problem
The Solution
The Technology
Features
Download
How To...
License
Change Log
Help/FAQ

News

December 17, 2007. v0.6 has been released, and fixes the bug found in v0.4

The -n "never die" bug described below has been fixed, and tested. I spent just about a month testing the backup and restore of a 29G tgz file. The upload process was 100% successful, and the download and un-tar was also 100% successful. So, we can all go back to using JS3tream with confidence.

Also, a bug when retrieving data from s3 was fixed. Some people were missing the last hand full of bytes of their downloaded file. It took way too long for me to track this one down, but I finally did. The retrieval process seems to be working now.

November 22 2007, v0.4 seems to have a serious bug.

It appears that the -n "never die" feature is not working with Js3tream. I thought I had just fixed a bug related to the "never die" feature, but it looks like I managed to introduce a new one instead. The bug I I had fixed was that when Js3tream failed to upload a pack of data, it waited for 30 minutes, then tried again. However, it was NOT retrying the pack of data that failed. Instead it moved to the next one. Oops.. Well.. That bug was fixed. However, now for some reason, when JS3team goes into this "never die" sequence. It never seems to manage to successfully upload the pack of data. It tries for ever, but seems to fail for ever too? I'm working hard on this bug right now, as it's caused my backups to stop working.

Introduction

JS3tream was written to provide easy streaming of data to and from the Amazon S3 data storage service. JS3tream is NOT a backup solution by itself. But, coupled with tar or zip, JS3tream provides a very powerful backup solution.

The Problem

Backups! I have a personal server at home with about 15G of very important data that I really need to have backed up. We all know we need to do it! But Where, How, and How much? That is the hard part.

How about a simple 250G external USB Hard Drive? That is actually not a bad idea at all. It's large enough to hold my important data. It's not too expensive to purchase. And it's hooked directly to my machine, so it's pretty fast. But... What if a Burglar breaks in, and steals both my machine, and the external backup drive? Or my 1 year old daughter decides that the computer looks thirsty, and pours water on everything? Or worse yet, my house catches fire and I loose everything!! Oh, and what size Drive is the best choice? Sure, you start with a nice cheap 200G, because you only need 20G for backups. But, what are the odds that you'll find more and more data to backup? That 200G is going to fill up in a hurry? Especially if your going to do Incremental Backups!

Ok then... Lets take that external USB hard drive to a friends house or work, and backup my files over a secure SSH connection to another machine? Hmmm.. that is not a bad idea either. Sure, the internet connection isn't as fast, but it will eventually backup everything. But, now I have to contend with SSH connections to another machine, that is most likely behind a firewall. That means opening up port 22 and exposing a possible security hole. This will work, but it's not trivial

How about a DLT or a DAT Tape drive? One easy answer to that one. Expensive! You can't even come close to the price/gig of an external hard drive.

Hey.. what about one of the many Offsite Internet based backup providers? There are quite a few of them, and they support Linux!! #1 major problem with this solution... $$$$$. These guys want a lot of money for a small amount of storage. The average cost I found was about $1 per Gigabyte per Month. That is not a ton of money, but a bit more than I want to pay for a personal backup. Then there is Carbonite. Not only are they cheap at a flat rate of $5 per month, but they have unlimited capacity too! But.. alas.. they don't support Linux/Unix.

In comes Amazon and their S3 web service. It's a simple offsite data storage solution with almost unlimited capacity and a very reasonable price. But why is this better than that USB drive at a friends house? Price per Gigabyte and ease of use. There are no firewalls to contend with. 15c per G of storage is quite cheap if you ask me. And, you don't have to wonder if there is enough room to store your data!

Ok.. the Amazon S3 system it is, but... how? I need to backup my data and maintain the original file metadata, like the GID, UID, timestamp and umask. A lot like the tried and true "TAR" utility does. There are a couple of freeware utilities to choose from that will allow you to upload files to S3. There is Jets3t. But it doesn't support UIDs or GIDs or Filemasks in a Unix/Linux environment. There is the very good s3sync utility. But I found that when I used it, I could not reliably get all of my data uploaded without it hanging. I never figured out why. Perhaps 15G was too much? PerhapsPerhaps it was something far simpler. So.. now what? (see the next section)

The Solution

Why re-invent the wheel? TAR, ZIP and RAR have been archiving files for years. And they do it very well. TAR was designed from the start for backups on a unix system. So lets leverage the power of TAR. But how do I make TAR store it's archives on S3? With JS3tream of course? Think of JS3tream as a bridge between the Amazon S3 system, and your backup software. JS3tream will read STDOUT from your tar utility, and store the stream on S3. Then, JS3tream can read the stream back from S3, and pipe it to tar on STDIN. Voila... Reliable TAR archives to a reliable offsite data store!

The Technology

JS3tream was written in Java. I chose to write it in Java for a few reasons. One because I've been writing java code since about 1995 and am very familiar with it. Two: it's cross platform, so this solution should work anywhere that a JVM will run. Three: Java and WebServices get along very nicely. Thanks to the Apache Axis Project .

What you will need

Java 1.5 or higher
An archive program like TAR, RAR or ZIP
An Amazon S3 account

How JS3tream does it's Magic

JS3tream is nothing more than a bridge between the Amazon S3 system and you. It abstracts the complexity of using the S3 service for such a simple task as storing raw data. This is why JS3tream is so simple to use.

When streaming data to S3 through JS3tream, JS3tream will take your data, break it up into manageable chunks, and store each chunk into a single S3 object within the S3 Bucket you named. Within the Amazon S3 bucket, there will be a service of S3 Objects named 0000000000 -> 999999999999. Each of these objects represents a chunk of your streamed data. If you did not have access to JS3tream, you could actually download each object one at a time, and concat them together to rebuild your original stream manually. For this reason, JS3tream is quite flexible, because if for what ever reason, JS3tream is failing you on your restore, you can still get your original data.

When streaming data back from S3 to your machine, JS3tream will read each of your original data chunks in order, and pipe the output of each chunk to STDOUT.

Features

Stream any and all data and types
100% Java and thus cross platform
Consists of just one file "js3tream.jar"
Easy command line options for setting up in a cron job
Performs at least 2 tries per stream part to ensure clean uploads
Checks MD5 sums at both ends and both directions to ensure good data transfers
Multiple data streams can be stored in a single S3 bucket
-neverdie option tells JS3tream to never give up trying to send/receive data
Store multiple archives in a single S3 bucket by using archive prefixes
Use either in memory or temp files for stream buffers. Memory for speed, temp files for larger S3 data chunks

Possible Future Changes

Bandwidth throttle
Double upload temp buffers for better upload speed.
GUI Front end

Download

Special thanks to David Soergel who wrote s3napback. A very nice perl wrapper for JS3tream that does a good job of automating the backup of your data. Including automatic rotation of backup sets, data encryption, and MySQL support.

If you would prefer JS3tream by itself, Download from the main Sourceforge site
To install, simply unzip or untar the downloaded file to a directory.
Then, from the command line and within the installed directory the following to see how to use JS3tream.

java -jar js3tream.jar -h

Look at the How To section for more usage examples.

How To

License

JS3tream is licensed under the GNU LESSER GENERAL PUBLIC LICENSE. I'm a big fan of free software, and hope this gives some back to the OpenSource community. In short, your free to use JS3tream for pretty much anything you want. But you can't charge any money to other people for JS3tream, or use JS3tream in any commercial programs.

Change Log

12/19/2007 - v0.6.2 of JS3tream released

This release provides a trivial reformatting fix for the help output. It was not well formatted on the default Windows command prompt. There are no functionality differences between this release and v0.6

12/17/2007 - v0.6 of JS3tream released

This release fixed a serious bug in v0.4 of JS3tream. The "neverdie" out of sequence bug. It turned out that when the neverdie "-n" option was, some of the uploaded buckets lost their correct sequence number. That is, when an upload failed, and JS3tream attempted to wait 30 minutes and continue the upload, it would try again on the failed data, but the containers index number was incremented when it should not have been.
Another bug fixed was an odd one I had trouble tracking down. Some people were experiencing a problem when retrieving a bucket from S3. The downloaded file was missing a few bytes at the end of the file. Between 50 and 200 or so bytes were not retrieved. This bug has also been fixed and tested now

11/17/207 - v0.5 of JS3tream was not released. This was an internal build
10/26/2006 - v0.4 of JS3tream released

This is a bug fix release. There was a problem with the sending and receiving of data containers when the "neverdie" option is used. It turned out that after the 30 minute wait to retry the failed container, JS3tream was NOT resending the failed container, but instead moving to the next. This has now been fixed.

01/05/2007 - Beta v0.2 of JS3tream released

Support for multiple archive streams in a single S3 bucket
JS3tream built using FatJar thus providing a single executable jar.
JS3tream now defaults to in-memory stream buffers for an increase in performance.
A new -neverdie option tells JS3tream to never giveup trying to send or receive data.
The new prefix option is backward compatible with v0.1 and it's lack of a prefix.

12/29/2006 - Beta v0.1 of JS3tream released
12/28/2006 - Early testing is done, time to setup the JS3tream Sourceforge web site.
12/14/2006 - Started trying to backup my server to Amazon S3 using other tools, and failed. So, I started writing JS3tream.

How to get Help

JS3tream FAQ
Amazon Simple Storage Service (Amazon S3)
Amazon S3 Developer Forums
View Source Code
JavaDocs
Send me an email to (sgspowell-js3tream at yahoo dot c o m) - Please forgive the cryptic email, this is to try and avoid spam bots.