Last month marked a huge milestone for SitePoint. Previously, most of our web pages were served up via dedicated servers rented from a hosting provider. I’m sure that back when those servers were rented, SitePoint must have been thrilled with the performance. Like all good things, however, this feeling would come to an end. In time, the infrastructure started presenting us with various limitations and at times became a cause of much frustration.The machines were aging, and the cost of renting them was fixed regardless of age or load. As the website grew and became more complex, the limitations of not being able to scale also became a problem. Our database replication logs needed to be pruned more and more frequently, or the server would run out of space. Load continued to increase as traffic increased. When new code was uploaded that wasn’t efficient enough or a search engine would spider our pages, the site’s performance would drop significantly.The way the operating system was originally installed also created some limitations which became more obvious as time went by. Our installation didn’t provide snapshot support, so we couldn’t take a full backup from a single point in time on demand, for example.We were also frequently at the mercy of the hosting provider’s support staff. If we needed to restore a system image from backups, we were potentially looking at hours of downtime. We didn’t have direct access to backups; any restore had to be manually requested of the hosting provider, and occupying their staff’s time to perform a manual operation such as this would potentially have increased our monthly charges.Over time, the software installed on the machine became a mess. Various developers had custom-compiled and installed applications without relying on the package management system. These were not installed in a consistent location: it seemed everyone had their own idea about where they should be placed. There was also a serious lack of documentation explaining the rationale behind those customizations. In some cases, even the source code was missing. Had patches been applied to produce the build of a given program that was in use? The people with that knowledge were no longer around. Would an Apache update from RedHat break a module that was installed without relying on the package management system? What if we forget to incorporate the custom modifications to the startup scripts that package updates might alter?Eventually, the GNU/Linux distribution we were using started to approach its support cutoff point, and our hosting provider urged us to upgrade it. If we continued using our servers past the OS EOL, we were warned that we could not count on any assistance from them. RedHat also would no longer be providing security updates, and we would have to take on the task of updating vulnerable applications ourselves.It was clear that we had no choice but to migrate to a newer operating system, and ideally more practical infrastructure. There were a number of ways we could have approached the situation, but we decided that we would take this opportunity to break away from our hosting provider and migrate our web services to EC2.
The EC2 Solution
Amazon’s EC2 IaaS solution has assisted SitePoint in addressing all of the problems I’ve touched on so far. We were already using it for other projects with great success, so were familiar with the benefits it brings. With EC2 we no longer have dedicated physical hardware, but we have instances–virtual machines that can be created or deleted on demand. How does this help?
The price does not need to be fixed. Amazon has, on a number of occasions, reduced the cost of running instances for everyone, presumably as the hardware used reduces in value. Amazon provides different instance hardware specifications for different prices, and it’s not hard to manage a setup that allows for easy switching between instance sizes. Why pay for an expensive server if you’re not using all the cycles?
On the other hand, what happens if you need more processing power to keep up with traffic demand? Increasing an instance size can also be easy. Additionally, you could decide to have multiple application server instances behind a load-balancing proxy. That way, as traffic increases, you can bring up new instances to handle the load and reduce the number of instances if it dies back down.
Depending on your application, you might choose to make an Elastic Block Store volume available to your image. These allow you to create point-in-time snapshots of all data they contain, be it databases, user-uploaded files … you name it. You can also use any snapshot to easily derive a new EBS image. Now we can manage backups ourselves. It’s easy to dictate when and how they are made, and we have complete control to perform restore operations ourselves, too.
Surprise! EC2 is hosted on physical hardware –- not a mystical cloud– so like any other hardware, EC2 would also be susceptible to hardware failures.Our old hosting provider was very good at handling these situations. On one occasion we were informed of a degraded RAID array on one of our servers and the support staff corrected it immediately–all before my daily RAID status check cron job emailed me to inform me of a problem. That’s service that’s hard to beat, so how does EC2 stack up?With EC2, you don’t have knowledge of that basic hardware level, since you’re working inside virtual machines. You do need to trust Amazon is looking after the hardware and will live-migrate instances to other hardware when required, if possible. Having said that, we have had occasions where instances have simply died.Once an instance has died, you can’t really live-migrate it any more. Further, if Amazon experiences a power outage to its entire data center, the ability to live-migrate to somewhere else in the data center isn’t particularly beneficial. What you can do, however, is create redundant instances in different regions, or set yourself up so that you can quickly boot instances in another region as required. By contrast, should our physical servers at our hosting provider die or lose power, there’s not much we can immediately do about it.While EC2 failures feel more common in my experience, this is offset by the benefit that they provide the ability to more easily work around almost any kind of problem.
Keeping Things Organized
Forcing you to keep installations reasonably organized is perhaps a nice side-effect of EC2′s typical instance storage model. Instance storage is not persistent. Once a virtual machine is shut down, everything on the instance storage volumes is lost. At first this looks like a major limitation, but as I’ve learned, there are a number of positive aspects to it.If your storage devices disappear when you shut down your system, what will become of your website files? You have two main options. The first is to attach an EBS volume, which will make your data persistent. In doing so, however, you will be keeping your EBS volume data quite separate from everything else, such as your operating system–something which will be part of the booted image and would not be backed up as part of an EBS snapshot.A second option would be to embed scripts within your booted image that automatically set your instances up the way you require. This option enables you to make a truly scalable solution, as you can bring up and down instances whenever you need them.In practice, hosting dynamic websites in EC2 would likely use both options–we’d use an EBS volume to store database contents, and a set of boot scripts to check the necessary codebase out of our version control system for application instances. Additionally, EC2 will allow us to easily and accurately clone our production setup so we can quickly create a real staging environment in which to test the changes–and only pay for the hours we need it.
As you might imagine, updating such a large quantity of PHP code and SQL queries to be compatible with newer versions of the software modern GNU/Linux distributions nowadays include was a daunting task in itself. Additionally, implementing the number of code changes required to make the most out of EC2 was not something that was immediately feasible. Although we’d decided on the direction we wanted to head in, we still needed an immediate solution to our dedicated hosting problem.Again, Amazon has came to the rescue with a relatively new feature–bootable EBS volumes. This facility enables us to effectively boot instances that gain the benefits that EBS brings to all its data. Having a bootable EBS volume means that the entire system can be snapshotted when it’s time for a backup. While things were in a state of flux, this has helped us nail problems and back up fixes very quickly, without having to worry about creating many image revisions. When our configuration stabilizes, we can create a new operating system image and take it out of EBS.The third phase of our strategy will be to entirely commit all code to a VCS. We may be able to remove EBS from the picture almost entirely for our application server instances, as code can be checked out of a VCS such as Git on GitHub–securely over SSH–immediately upon boot, making new instances production-ready minutes after launch. We will continue to make things as dynamic as possible, so we can take full advantage of all that EC2 offers, such as quickly scaling to multiple applications servers as required.
The hardest phase in our migration is complete–it mainly required our code to be updated. Regardless of where we ultimately decided to host our code, this step was unavoidable. While we have not yet completely achieved all of our goals, we are well on the road to meeting them. We now have more control over our data and infrastructure than ever before, so we are better prepared for wherever the future may take us.
Download the free sample chapters of Host Your Web Site in the Cloud from Amazon Web Services Senior Evangelist Jeff Barr here!
Adam Bolte is SitePoint's systems administrator and free software activist. He has been running various GNU/Linux distributions as his desktop of choice since 1998, and has a tendency to install the Linux kernel onto any device he owns.