Hands on with Amazon Storage Gateway

Amazon's new Storage Gateway offers a twist on cloud-based backups, but beware of the rough edges

Last week, Amazon announced the most recent addition to its AWS product portfolio: The Amazon Storage Gateway, a simple, powerful way to back up on-premise data to Amazon's S3 storage infrastructure.

But the Storage Gateway is not just a cloud-based backup solution. It also opens the door to using Amazon EC2 instances as a means to provide disaster recovery.

To get a better feel for the possibilities, I decided to set one up for myself. What I found is an initial release product that is not without some serious limitations, but well worth the time to experiment with, nonetheless.

A quick tour of Amazon Storage Gateway
In basic terms, the Amazon Storage Gateway is a virtual appliance that allows you to snapshot your own on-premise storage into Amazon's highly redundant  AWS S3 storage infrastructure. In that sense, it seems much like other cloud backup offerings. But there are some crucial differences.

The first difference is that the Storage Gateway presents the storage that you allocate to it back to you as a block-level iSCSI volume. That means you can place literally anything on that volume, regardless of the file system or type of data -- and protect it lock, stock, and barrel in S3. A block-level approach does have its downsides (such as making file-level restores impossible), but it gives you the flexibility to do whatever you want with that storage (file system encryption, deduplication, you name it). You can even use it to provide offsite snapshot protection for a production storage volume.

The second difference is that the Storage Gateway stores S3 snapshots in the form of Elastic Block Storage snapshots within AWS.  If you've used Amazon's EC2 offering, you'll recognize this as the same mechanism that Amazon uses to allow you to take snapshots of EBS volumes that are tied to your EC2 compute instances.  This means that you can not only snapshot your on-premise data to the cloud, but also use those cloud-based snapshots to instantiate volumes which you can attach to EC2 compute instances. This is perhaps the most interesting angle of the Storage Gateway product.

Getting started
The first thing you need to get started is an Amazon AWS account. If you already have one and maybe even have some instances running, you basically just have to mash a few buttons to enable the Storage Gateway product line. If you don't, signing up for one is easy.

The Storage Gateway product itself is free for 60-days -- the only thing you'll end up needing to pay for is the space you actually consume on S3, which is currently billed at $0.14 per gigabyte per month. So tossing in a few gigabytes to test the waters for a week or two will cost quite a bit less than your morning coffee.

Once you've logged into the AWS Management Console, you can fire up the step-by-step installation guide that will walk you through creating a new storage gateway. At the moment, the storage gateway is distributed in a VMware OVA format, meaning that support is limited to VMware vSphere platforms (specifically vSphere 4.1) but Amazon plans to add support for more hypervisors in subsequent releases.

Creating the appliance
The extremely thorough installation guide not only walks you through downloading and importing the OVA into your vSphere environment, but also makes the assumption that you might not have a vSphere 4.1 host in your environment and provides links to the necessary documentation and download site to build one.

Once you've imported the OVA, you'll fire it up and punch the IP address that it caught from your DHCP server into the AWS console. Through a bit of inventive client-side browser coding, the AWS Console actually contacts the Storage Gateway and links it with your AWS account. From there on out, you manage the Storage Gateway entirely from within the AWS Console. The appliance itself does not have any management console of its own -- if you hit the gateway's local IP with a Web browser, it just redirects you to the AWS Console.

Assigning storage
Once the gateway is up and linked to your account, you're ready to assign local storage to it. That's done from within the vSphere Client and this, other than requiring a paravirtualized storage controller to be configured instead of the default LSI SAS controller, is just like adding disks to any other virtual machine you might have worked with.

Given that vSphere is handling the storage allocation, you can use any kind of storage that you can make available to that vSphere host -- be it SAN, NAS, or DAS.  The choice of which kind of storage to use really depends upon the redundancy and throughput requirements you intend to place on the iSCSI volume that the Storage Gateway will present to you, but really anything will work for testing purposes.

You'll need to add a volume that you're going to make available to iSCSI initiators on your on-premise network as well as a "Local Working Storage" volume, which the appliance will use to cache changes before they are asynchronously spooled up to S3. Amazon recommends that you size this volume at 20 percent of the size of your main volumes, but depending upon a number of factors such as how frequently you snapshot it to S3 and how much data you write into the volume, you may need more or less than that. You can also use an existing virtual disk as a storage target -- the Storage Gateway will give you the option to leave the data intact, which provides a very easy way to add snapshot capability to already existing services.

In my case, I added a new 100GB volume for data and a 20GB Local Working Storage cache. Once I added those within the vSphere Client, I switched back to the AWS Management Console and those disks were immediately available as options for configuring local working storage and volumes. Despite the fact that the Storage Gateway is being managed from afar, the communication between the appliance and the management console is very snappy -- you never find yourself hitting Refresh waiting for changes you made to the gateway to show up in the console.

Accessing the volume and taking snapshots
Once you've configured the disks, you're ready to connect to them with an iSCSI initiator on your premise. As with the initial setup process, Amazon's documentation is unusually thorough in this regard -- providing step-by-step instructions for connecting to the volume using Windows 7, Windows 2008, or Red Hat Enterprise Linux 5 (the three officially supported operating systems). Once that's done, you can format the volume on whatever machine you've attached it to and toss data in.

By default, a snapshot schedule is created when you created Storage Gateway that will create an EBS snapshot every 24 hours. You can modify that schedule to fire more frequently (down to once an hour), manually request a snapshot to be created whenever you need one, or do some scripting and use Amazon's AWS API to request the snapshots on whatever schedule you like. As with most AWS services, you'll likely end up learning some scripting, as there have historically been many more features available when you use the API than there are in the management console; that said, you really don't need to get into that during the testing process.

Regardless of how it's triggered, once a snapshot is requested the Storage Gateway will create an empty EBS snapshot instance within S3 and start to push all of the data stored in its local working storage up to it. How long that takes depends entirely upon how fast the upstream of your internet connection is -- in my case, shipping a gigabyte of data took about half an hour but, if you have a huge upstream, it will obviously move along much faster.

Restoring a snapshot
Once the EBS snapshot has completed uploading, you can use it in a number of different ways. The most obvious is to use it as the seed for a new Storage Gateway volume -- useful if your local storage environment has suffered a failure and you're restoring. To test this, I disconnected from the iSCSI volume and then used the management console to delete the volume. Then, using the same target disk, created a new volume, but this time I used a snapshot I had just taken as the seed for it. This process was by no means immediate as I had to wait for all of the data to stream back down from S3, but it worked as advertised.

However, you can also restore that snapshot as an EBS volume and then attach it to an EC2 compute instance -- essentially providing the exact same on-premise storage volume to a cloud-based server. Creative use of that functionality together with the VM Import tool could not only provide you with a cloud-based backup solution, but also fulfill a disaster recovery role.

Good, but not perfect
Despite being impressed with the overall functionality presented by this initial release, there were a few things that reminded me that it was, in fact, very new. Included in that list are the very stringent support requirements regarding the on-premise hypervisor and the iSCSI initiators that are supported, an inability to scale the size of a volume after it has been created, a max volume size of 1TB, a requirement for DHCP support (you have to hack the Storage Gateway if you want a static address), and a few other little bleeding-edge gotchas along those lines.

The one large piece of functionality I was personally sad to see not work (at present, at least) is the ability to restore a snapshot of an EC2 EBS instance to a Storage Gateway. That functionality would allow you to easily bring an entire EC2 instance back to your on-premise environment in a block-level format -- and, in a way, allow you to implement bidirectional disaster recovery where Amazon backs up your on-premise environment and you back up their off-premise environment. Though, in practice, this approach might end up being expensive, because while there are no inbound data charges when you are paying for a Storage Gateway (which are $125/mo each outside of the 60 day grace period), there are outbound data charges that start at $0.12/GB.

The bottom line
The Amazon Storage Gateway, while fulfilling its role as a cloud-based backup tool, is also encouraging step toward providing more integration between the public cloud and traditional on-premise infrastructures. But this initial release, while excellent in many ways, suffers from limitations that will lead most to conclude it's not yet ready for prime time. That said, it costs almost nothing to give it a shot, so I'd definitely recommend experimenting with it.

This article, "Hands on with Amazon Storage Gateway," originally appeared at InfoWorld.com. Read more of Matt Prigge's Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2012 IDG Communications, Inc.