Yesterday, Amazon’s S3 cloud storage service went down for the second time this year, bringing a swathe of web 2.0 sites down with it. Sites and apps like SmugMug, Twhirl, Twitter, SlideShare, and even SitePoint were all affected by the downtime. As WebWorkerDaily notes, Amazon’s SLA guarantees 99.9% uptime — or about no more than 45 minutes downtime per month. So it’s a safe bet that Amazon will owe a bunch of sites refunds for this most recent hours-long outage.
For many users of Amazon’s or other cloud storage services, however, downtime is unacceptable and can cost the business more than just the price of that missed hosting. Travel guide startup Planaroo, which was also negatively affected by the S3 outage, penned a post on their company blog today outlining what they learned from the issues with Amazon. Here are their takeaways:
- Backups are a good thing. Having a Plan B is a necessity. Spend the extra dough to have backups of your mission critical files. Planaroo had backups of all the photos they host on S3, so when Amazon’s service went down, they could easily switch to the backups by just replacing a URL in their photo database.
- Fix it. Don’t hope for the best. One thing Planaroo did wrong, however, was assume Amazon would right the ship quickly. It was 2 hours before they decided to give up waiting on Amazon and switch to their backups. Lesson: when things start going wrong, don’t assume your host will have them fixed quickly. Your contingency plan is there for a reason, so use it.
- Keep backups up-to-date and ready-to-go. When your main host goes down and you need to switch on the backup, make sure it is up-to-date. If you have to waste time uploading files to your backup server, well, then you’re just wasting time.
- Other services could go down too. It’s not just your host. In the world of APIs and hosted libraries, many startups depend on multiple third-party services to keep things running. Planaroo, for example, uses Google’s AJAX Libraries. Google could have a similar problem as Amazon (Google’s App Engine had some extended downtime a month ago, in fact), so it pays to have a contingency plan for every service you rely on.
That last point highlights something important about this list: It doesn’t just apply to S3 or cloud storage services. These are good lessons that can be applied to ANY hosting environment.
For the full list of lessons that Planaroo laid out, check out their blog post.