Five Tips to Help Survive an Amazon Cloud Outage

By Zev Laderman

The Amazon Cloud outage affected a wide variety of startups, SMBs, and enterprise-level companies. Nonetheless, many companies are still adopting cloud solutions. Amazon Cloud services specifically offers flexible access to infrastructure and Class-A hardware, as well as a pay-as-you-go model.

Unfortunately, what could be considered a small glitch on Amazon’s side can result in a nightmare come true for DevOps and IT Managers. The situation can easily become more complicated when high-visibility websites and companies (think Quora) suddenly go down due to an outage, and those responsible for their cloud start thinking about moving solution providers or wasting time and money chasing their tails.

At Newvem Analytics we’ve monitored hundreds of Amazon Cloud accounts — some of which were affected by both the April 2011 outage and last week’s AWS outage. By monitoring our user’s AWS usage we’ve discovered that more than 35% of our beta partners are operating a cloud that is highly vulnerable to an outage.

But it doesn’t need to be this way.

The good news is that you don’t need to be an Amazon AWS guru to protect your cloud from a potential outage. Our analytics team has put together this list of five must-do business practices to make an AWS cloud better protected from service outages.

1. Take advantage of multiple availability zones when using elastic load balancing. Load balancing your application’s incoming traffic between multiple instances makes it more fault-tolerant. In addition, you can further enhance fault tolerance by enabling elastic load balancing (ELB) across multiple availability zones. In the event that one availability zone goes down, the ELB will still ensure uptime by distributing traffic across the instances on the other availability zones.

2. Be aware of unhealthy instances behind ELBs. Unhealthy instances do not receive traffic from ELBs. This means that even if you have enough instances behind an ELB (even across multiple availability zones), an unhealthy instance will not receive traffic at all.

3. Maintain timely snapshots of your EBSs. In the case that an outage damages an EBS volume, a user can recreate the volume from a snapshot, in the same state as the time of the snapshot. If the availability zone where the EBS was active remains down for a long time, the user can provision an EBS volume from a snapshot in another availability zone. Volumes are tied to availability zones, but snapshots are tied only to the region. In case your volumes are unavailable in one availability zone, you can restore the last saved snapshot onto another availability zone.

4. Keep critical data copies off the AWS Cloud. It’s very important to keep offsite copies of critical data. We’d also suggest that you consider a third-party offsite service to back up your data. However, use caution here, as some of these services may actually run on top of AWS.

5. Use an external tool to monitor your system. AWS CloudWatch is a handy service to monitor your AWS resources, yet the level of interdependency of AWS services isn’t always clear — in other words, it may not be reliable in the event of an outage. For this reason, while CloudWatch can be your routine monitoring system for AWS, you should consider an external monitoring system, separate from AWS, which will alert you of an outage independently.

There are many tools and best practices that can help increase your chances of uptime during an AWS outage. These are simple steps that you can take to protect your data and enhance its chances of surviving in the event of an outage.

While these are valuable tips to help you survive an outage, they are just a seed to planning for disaster recovery and are certainly not a guarantee that your cloud will still function after the outage.

  • Carradee

    I just today decided to use Amazon Cloud as a backup service—but it’s one of them. I have another, independent backup service that’s more specific and thorough (CrashPlan), and I burn data DVDs as regular on-site backups. I also have an on-site external hard drive.

    So now I have 2 off-site backup methods + 2 on-site backup methods. Ideally, I’d like to have one more of each. (Or at least a fireproof safe for my on-site backup drive.)

    Paranoid? Maybe. But I’ve had too many cases of 2 or 3 things spontaneously failing at the selfsame time. (One novel I’m trying to finish for release, I’m joking it’s jinxed, because most folks who’ve had it in their court to work on—including me—have had weird life stuff interfere.

  • James McGregor

    I have been using AWS since 2010 and have found it totally reliable. Much more so than any co-located service I have worked with in the past. maintaining online apps since 2005 and using co-lo since 2002 I do have lots of experience with cloud services and I find Amazon by far the best and most reliable out there.

Get the latest in Front-end, once a week, for free.