Don’t Get Burned When the Cloud Goes Down

Tweet

Yesterday, Amazon’s S3 cloud storage service went down for the second time this year, bringing a swathe of web 2.0 sites down with it. Sites and apps like SmugMug, Twhirl, Twitter, SlideShare, and even SitePoint were all affected by the downtime. As WebWorkerDaily notes, Amazon’s SLA guarantees 99.9% uptime — or about no more than 45 minutes downtime per month. So it’s a safe bet that Amazon will owe a bunch of sites refunds for this most recent hours-long outage.

For many users of Amazon’s or other cloud storage services, however, downtime is unacceptable and can cost the business more than just the price of that missed hosting. Travel guide startup Planaroo, which was also negatively affected by the S3 outage, penned a post on their company blog today outlining what they learned from the issues with Amazon. Here are their takeaways:

  • Backups are a good thing. Having a Plan B is a necessity. Spend the extra dough to have backups of your mission critical files. Planaroo had backups of all the photos they host on S3, so when Amazon’s service went down, they could easily switch to the backups by just replacing a URL in their photo database.
  • Fix it. Don’t hope for the best. One thing Planaroo did wrong, however, was assume Amazon would right the ship quickly. It was 2 hours before they decided to give up waiting on Amazon and switch to their backups. Lesson: when things start going wrong, don’t assume your host will have them fixed quickly. Your contingency plan is there for a reason, so use it.
  • Keep backups up-to-date and ready-to-go. When your main host goes down and you need to switch on the backup, make sure it is up-to-date. If you have to waste time uploading files to your backup server, well, then you’re just wasting time.
  • Other services could go down too. It’s not just your host. In the world of APIs and hosted libraries, many startups depend on multiple third-party services to keep things running. Planaroo, for example, uses Google’s AJAX Libraries. Google could have a similar problem as Amazon (Google’s App Engine had some extended downtime a month ago, in fact), so it pays to have a contingency plan for every service you rely on.

That last point highlights something important about this list: It doesn’t just apply to S3 or cloud storage services. These are good lessons that can be applied to ANY hosting environment.

For the full list of lessons that Planaroo laid out, check out their blog post.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Tijs

    For many services it’s the cost of storage that makes S3 attractive. keeping a backup of all S3 data on ‘expensive’ alternatives sort of defies the purpose. what we need is an S3 competitor. I would love to be able to keep backups in the google cloud and switch to either depending on availability. Since both EU and US S3 services were down i guess keeping your data on both isn’t a solution in this case.

  • gdog

    @Tjis: Totally correct.

    I am considering moving some corporate functionality in the near-future to EC/S3 because I expect it to me more reliable than hosting my own boxes. I would expect it to be built out such that downtimes are infinitely less likely than me slapping a few boxes into geographically-disperse data centers.

    But, why bother if I can do better? Apart from scaling-for-spikes, I have had 99.999% uptime in the last year. Apparently , I have better uptime and redundancy than Amazon for not that much more a month.

  • http://mingz-online.com mingz

    It’s good to remind project managers the importance of risk management. If you don’t manage risk, you are risking your business.