Five Tips to Help Survive an Amazon Cloud OutageBy Zev Laderman
The Amazon Cloud outage affected a wide variety of startups, SMBs, and enterprise-level companies. Nonetheless, many companies are still adopting cloud solutions. Amazon Cloud services specifically offers flexible access to infrastructure and Class-A hardware, as well as a pay-as-you-go model.
Unfortunately, what could be considered a small glitch on Amazon’s side can result in a nightmare come true for DevOps and IT Managers. The situation can easily become more complicated when high-visibility websites and companies (think Quora) suddenly go down due to an outage, and those responsible for their cloud start thinking about moving solution providers or wasting time and money chasing their tails.
At Newvem Analytics we’ve monitored hundreds of Amazon Cloud accounts — some of which were affected by both the April 2011 outage and last week’s AWS outage. By monitoring our user’s AWS usage we’ve discovered that more than 35% of our beta partners are operating a cloud that is highly vulnerable to an outage.
But it doesn’t need to be this way.
The good news is that you don’t need to be an Amazon AWS guru to protect your cloud from a potential outage. Our analytics team has put together this list of five must-do business practices to make an AWS cloud better protected from service outages.
1. Take advantage of multiple availability zones when using elastic load balancing. Load balancing your application’s incoming traffic between multiple instances makes it more fault-tolerant. In addition, you can further enhance fault tolerance by enabling elastic load balancing (ELB) across multiple availability zones. In the event that one availability zone goes down, the ELB will still ensure uptime by distributing traffic across the instances on the other availability zones.
2. Be aware of unhealthy instances behind ELBs. Unhealthy instances do not receive traffic from ELBs. This means that even if you have enough instances behind an ELB (even across multiple availability zones), an unhealthy instance will not receive traffic at all.
3. Maintain timely snapshots of your EBSs. In the case that an outage damages an EBS volume, a user can recreate the volume from a snapshot, in the same state as the time of the snapshot. If the availability zone where the EBS was active remains down for a long time, the user can provision an EBS volume from a snapshot in another availability zone. Volumes are tied to availability zones, but snapshots are tied only to the region. In case your volumes are unavailable in one availability zone, you can restore the last saved snapshot onto another availability zone.
4. Keep critical data copies off the AWS Cloud. It’s very important to keep offsite copies of critical data. We’d also suggest that you consider a third-party offsite service to back up your data. However, use caution here, as some of these services may actually run on top of AWS.
5. Use an external tool to monitor your system. AWS CloudWatch is a handy service to monitor your AWS resources, yet the level of interdependency of AWS services isn’t always clear — in other words, it may not be reliable in the event of an outage. For this reason, while CloudWatch can be your routine monitoring system for AWS, you should consider an external monitoring system, separate from AWS, which will alert you of an outage independently.
There are many tools and best practices that can help increase your chances of uptime during an AWS outage. These are simple steps that you can take to protect your data and enhance its chances of surviving in the event of an outage.
While these are valuable tips to help you survive an outage, they are just a seed to planning for disaster recovery and are certainly not a guarantee that your cloud will still function after the outage.