A bridge between your data, and Infinite Storage | Written by Shane Powell |
The -n "never die" bug described below has been fixed, and tested. I spent just about a month testing the backup and restore of a 29G tgz file. The upload process was 100% successful, and the download and un-tar was also 100% successful. So, we can all go back to using JS3tream with confidence.
Also, a bug when retrieving data from s3 was fixed. Some people were missing the last hand full of bytes of their downloaded file. It took way too long for me to track this one down, but I finally did. The retrieval process seems to be working now.
It appears that the -n "never die" feature is not working with Js3tream. I thought I had just fixed a bug related to the "never die" feature, but it looks like I managed to introduce a new one instead. The bug I I had fixed was that when Js3tream failed to upload a pack of data, it waited for 30 minutes, then tried again. However, it was NOT retrying the pack of data that failed. Instead it moved to the next one. Oops.. Well.. That bug was fixed. However, now for some reason, when JS3team goes into this "never die" sequence. It never seems to manage to successfully upload the pack of data. It tries for ever, but seems to fail for ever too? I'm working hard on this bug right now, as it's caused my backups to stop working.
JS3tream was written to provide easy streaming of data to and from the Amazon S3 data storage service. JS3tream is NOT a backup solution by itself. But, coupled with tar or zip, JS3tream provides a very powerful backup solution.
Backups! I have a personal server at home with about 15G of very important data that I really need to have backed up. We all know we need to do it! But Where, How, and How much? That is the hard part.
How about a simple 250G external USB Hard Drive? That is actually not a bad idea at all. It's large enough to hold my important data. It's not too expensive to purchase. And it's hooked directly to my machine, so it's pretty fast. But... What if a Burglar breaks in, and steals both my machine, and the external backup drive? Or my 1 year old daughter decides that the computer looks thirsty, and pours water on everything? Or worse yet, my house catches fire and I loose everything!! Oh, and what size Drive is the best choice? Sure, you start with a nice cheap 200G, because you only need 20G for backups. But, what are the odds that you'll find more and more data to backup? That 200G is going to fill up in a hurry? Especially if your going to do Incremental Backups!
Ok then... Lets take that external USB hard drive to a friends house or work, and backup my files over a secure SSH connection to another machine? Hmmm.. that is not a bad idea either. Sure, the internet connection isn't as fast, but it will eventually backup everything. But, now I have to contend with SSH connections to another machine, that is most likely behind a firewall. That means opening up port 22 and exposing a possible security hole. This will work, but it's not trivial
How about a DLT or a DAT Tape drive? One easy answer to that one. Expensive! You can't even come close to the price/gig of an external hard drive.
Hey.. what about one of the many Offsite Internet based backup providers? There are quite a few of them, and they support Linux!! #1 major problem with this solution... $$$$$. These guys want a lot of money for a small amount of storage. The average cost I found was about $1 per Gigabyte per Month. That is not a ton of money, but a bit more than I want to pay for a personal backup. Then there is Carbonite. Not only are they cheap at a flat rate of $5 per month, but they have unlimited capacity too! But.. alas.. they don't support Linux/Unix.
In comes Amazon and their S3 web service. It's a simple offsite data storage solution with almost unlimited capacity and a very reasonable price. But why is this better than that USB drive at a friends house? Price per Gigabyte and ease of use. There are no firewalls to contend with. 15c per G of storage is quite cheap if you ask me. And, you don't have to wonder if there is enough room to store your data!
Ok.. the Amazon S3 system it is, but... how? I need to backup my data and maintain the original file metadata, like the GID, UID, timestamp and umask. A lot like the tried and true "TAR" utility does. There are a couple of freeware utilities to choose from that will allow you to upload files to S3. There is Jets3t. But it doesn't support UIDs or GIDs or Filemasks in a Unix/Linux environment. There is the very good s3sync utility. But I found that when I used it, I could not reliably get all of my data uploaded without it hanging. I never figured out why. Perhaps 15G was too much? PerhapsPerhaps it was something far simpler. So.. now what? (see the next section)
Why re-invent the wheel? TAR, ZIP and RAR have been archiving files for years. And they do it very well. TAR was designed from the start for backups on a unix system. So lets leverage the power of TAR. But how do I make TAR store it's archives on S3? With JS3tream of course? Think of JS3tream as a bridge between the Amazon S3 system, and your backup software. JS3tream will read STDOUT from your tar utility, and store the stream on S3. Then, JS3tream can read the stream back from S3, and pipe it to tar on STDIN. Voila... Reliable TAR archives to a reliable offsite data store!
JS3tream was written in Java. I chose to write it in Java for a few reasons. One because I've been writing java code since about 1995 and am very familiar with it. Two: it's cross platform, so this solution should work anywhere that a JVM will run. Three: Java and WebServices get along very nicely. Thanks to the Apache Axis Project.
JS3tream is nothing more than a bridge between the Amazon S3 system and you. It abstracts the complexity of using the S3 service for such a simple task as storing raw data. This is why JS3tream is so simple to use.
When streaming data to S3 through JS3tream, JS3tream will take your data, break it up into manageable chunks, and store each chunk into a single S3 object within the S3 Bucket you named. Within the Amazon S3 bucket, there will be a service of S3 Objects named 0000000000 -> 999999999999. Each of these objects represents a chunk of your streamed data. If you did not have access to JS3tream, you could actually download each object one at a time, and concat them together to rebuild your original stream manually. For this reason, JS3tream is quite flexible, because if for what ever reason, JS3tream is failing you on your restore, you can still get your original data.
When streaming data back from S3 to your machine, JS3tream will read each of your original data chunks in order, and pipe the output of each chunk to STDOUT.
Special thanks to David Soergel who wrote s3napback. A very nice perl wrapper for JS3tream that does a good job of automating the backup of your data. Including automatic rotation of backup sets, data encryption, and MySQL support.
If you would prefer JS3tream by itself, Download from the main Sourceforge site
To install, simply unzip or untar the downloaded file to a directory.
Then, from the command line and within the installed directory the following to see how to use JS3tream.
java -jar js3tream.jar -h
Look at the How To section for more usage examples.
JS3tream is licensed under the GNU LESSER GENERAL PUBLIC LICENSE. I'm a big fan of free software, and hope this gives some back to the OpenSource community. In short, your free to use JS3tream for pretty much anything you want. But you can't charge any money to other people for JS3tream, or use JS3tream in any commercial programs.