Last month marked a huge milestone for SitePoint. Previously, most of our web pages were served up via dedicated servers rented from a hosting provider. I’m sure that back when those servers were rented, SitePoint must have been thrilled with the performance. Like all good things, however, this feeling would come to an end. In time, the infrastructure started presenting us with various limitations and at times became a cause of much frustration.The machines were aging, and the cost of renting them was fixed regardless of age or load. As the website grew and became more complex, the limitations of not being able to scale also became a problem. Our database replication logs needed to be pruned more and more frequently, or the server would run out of space. Load continued to increase as traffic increased. When new code was uploaded that wasn’t efficient enough or a search engine would spider our pages, the site’s performance would drop significantly.The way the operating system was originally installed also created some limitations which became more obvious as time went by. Our installation didn’t provide snapshot support, so we couldn’t take a full backup from a single point in time on demand, for example.We were also frequently at the mercy of the hosting provider’s support staff. If we needed to restore a system image from backups, we were potentially looking at hours of downtime. We didn’t have direct access to backups; any restore had to be manually requested of the hosting provider, and occupying their staff’s time to perform a manual operation such as this would potentially have increased our monthly charges.Over time, the software installed on the machine became a mess. Various developers had custom-compiled and installed applications without relying on the package management system. These were not installed in a consistent location: it seemed everyone had their own idea about where they should be placed. There was also a serious lack of documentation explaining the rationale behind those customizations. In some cases, even the source code was missing. Had patches been applied to produce the build of a given program that was in use? The people with that knowledge were no longer around. Would an Apache update from RedHat break a module that was installed without relying on the package management system? What if we forget to incorporate the custom modifications to the startup scripts that package updates might alter?Eventually, the GNU/Linux distribution we were using started to approach its support cutoff point, and our hosting provider urged us to upgrade it. If we continued using our servers past the OS EOL, we were warned that we could not count on any assistance from them. RedHat also would no longer be providing security updates, and we would have to take on the task of updating vulnerable applications ourselves.It was clear that we had no choice but to migrate to a newer operating system, and ideally more practical infrastructure. There were a number of ways we could have approached the situation, but we decided that we would take this opportunity to break away from our hosting provider and migrate our web services to EC2.

The EC2 Solution

Amazon’s EC2 IaaS solution has assisted SitePoint in addressing all of the problems I’ve touched on so far. We were already using it for other projects with great success, so were familiar with the benefits it brings. With EC2 we no longer have dedicated physical hardware, but we have instances–virtual machines that can be created or deleted on demand. How does this help?

Price

The price does not need to be fixed. Amazon has, on a number of occasions, reduced the cost of running instances for everyone, presumably as the hardware used reduces in value. Amazon provides different instance hardware specifications for different prices, and it’s not hard to manage a setup that allows for easy switching between instance sizes. Why pay for an expensive server if you’re not using all the cycles?

Scalability

On the other hand, what happens if you need more processing power to keep up with traffic demand? Increasing an instance size can also be easy. Additionally, you could decide to have multiple application server instances behind a load-balancing proxy. That way, as traffic increases, you can bring up new instances to handle the load and reduce the number of instances if it dies back down.

Backups

Depending on your application, you might choose to make an Elastic Block Store volume available to your image. These allow you to create point-in-time snapshots of all data they contain, be it databases, user-uploaded files … you name it. You can also use any snapshot to easily derive a new EBS image. Now we can manage backups ourselves. It’s easy to dictate when and how they are made, and we have complete control to perform restore operations ourselves, too.

Hardware Failures

Surprise! EC2 is hosted on physical hardware –- not a mystical cloud– so like any other hardware, EC2 would also be susceptible to hardware failures.Our old hosting provider was very good at handling these situations. On one occasion we were informed of a degraded RAID array on one of our servers and the support staff corrected it immediately–all before my daily RAID status check cron job emailed me to inform me of a problem. That’s service that’s hard to beat, so how does EC2 stack up?With EC2, you don’t have knowledge of that basic hardware level, since you’re working inside virtual machines. You do need to trust Amazon is looking after the hardware and will live-migrate instances to other hardware when required, if possible. Having said that, we have had occasions where instances have simply died.Once an instance has died, you can’t really live-migrate it any more. Further, if Amazon experiences a power outage to its entire data center, the ability to live-migrate to somewhere else in the data center isn’t particularly beneficial. What you can do, however, is create redundant instances in different regions, or set yourself up so that you can quickly boot instances in another region as required. By contrast, should our physical servers at our hosting provider die or lose power, there’s not much we can immediately do about it.While EC2 failures feel more common in my experience, this is offset by the benefit that they provide the ability to more easily work around almost any kind of problem.

Keeping Things Organized

Forcing you to keep installations reasonably organized is perhaps a nice side-effect of EC2’s typical instance storage model. Instance storage is not persistent. Once a virtual machine is shut down, everything on the instance storage volumes is lost. At first this looks like a major limitation, but as I’ve learned, there are a number of positive aspects to it.If your storage devices disappear when you shut down your system, what will become of your website files? You have two main options. The first is to attach an EBS volume, which will make your data persistent. In doing so, however, you will be keeping your EBS volume data quite separate from everything else, such as your operating system–something which will be part of the booted image and would not be backed up as part of an EBS snapshot.A second option would be to embed scripts within your booted image that automatically set your instances up the way you require. This option enables you to make a truly scalable solution, as you can bring up and down instances whenever you need them.In practice, hosting dynamic websites in EC2 would likely use both options–we’d use an EBS volume to store database contents, and a set of boot scripts to check the necessary codebase out of our version control system for application instances. Additionally, EC2 will allow us to easily and accurately clone our production setup so we can quickly create a real staging environment in which to test the changes–and only pay for the hours we need it.

Baby Steps

As you might imagine, updating such a large quantity of PHP code and SQL queries to be compatible with newer versions of the software modern GNU/Linux distributions nowadays include was a daunting task in itself. Additionally, implementing the number of code changes required to make the most out of EC2 was not something that was immediately feasible. Although we’d decided on the direction we wanted to head in, we still needed an immediate solution to our dedicated hosting problem.Again, Amazon has came to the rescue with a relatively new feature–bootable EBS volumes. This facility enables us to effectively boot instances that gain the benefits that EBS brings to all its data. Having a bootable EBS volume means that the entire system can be snapshotted when it’s time for a backup. While things were in a state of flux, this has helped us nail problems and back up fixes very quickly, without having to worry about creating many image revisions. When our configuration stabilizes, we can create a new operating system image and take it out of EBS.The third phase of our strategy will be to entirely commit all code to a VCS. We may be able to remove EBS from the picture almost entirely for our application server instances, as code can be checked out of a VCS such as Git on GitHub–securely over SSH–immediately upon boot, making new instances production-ready minutes after launch. We will continue to make things as dynamic as possible, so we can take full advantage of all that EC2 offers, such as quickly scaling to multiple applications servers as required.

Conclusion

The hardest phase in our migration is complete–it mainly required our code to be updated. Regardless of where we ultimately decided to host our code, this step was unavoidable. While we have not yet completely achieved all of our goals, we are well on the road to meeting them. We now have more control over our data and infrastructure than ever before, so we are better prepared for wherever the future may take us.

note:Do You Want to Know More about Cloud Hosting?

Download the free sample chapters of Host Your Web Site in the Cloud from Amazon Web Services Senior Evangelist Jeff Barr here!

Adam Bolte is SitePoint's systems administrator and free software activist. He has been running various GNU/Linux distributions as his desktop of choice since 1998, and has a tendency to install the Linux kernel onto any device he owns.

Get your free chapter of Level Up Your Web Apps with Go

Get a free chapter of Level Up Your Web Apps with Go, plus updates and exclusive offers from SitePoint.


  • GoPoint

    Congratulations !

  • Bob Smith

    I would be interested in a full breakdown of your setup including how you handle issues and backups if you’re willing?

    • Adam Bolte

      That would make for a very long and boring blog^H^H^H^Hbook, I assure you. :)

      I think a lot of that would be very specific to our website needs, and probably wouldn’t be very useful outside of SitePoint. We have different setups for different parts of our website for example, so the way we deal with issues may depend on where it is specifically.

      Documenting our backup system also likely wouldn’t be much help, as everyone will have different requirements. Backup frequency may depend on the importance and frequency of the data changing, but even this is uncertain perhaps due to a company internal policy or budget restrictions. Then there’s the question of retention – again this will vary from company to company as required.

      Even the way backups are created will often differ. In many cases there will be a need for a snapshot-style backup where everything is in a consistent state. In other cases that might not be a requirement, so a different easier/cheaper system could be used. Different applications may also prefer to be backed up in specific ways to make restores easier. Again, company policy may dictate that a specific backup program or system is used.

      Sorry Bob – I just can’t see a breakdown of our setup being that useful to others. Perhaps a post on *general* EC2 management considerations would be in order?

  • Tim

    I’ve had my head buried in EC2 for the last 12 months, and it certainly does present some interesting challenges. The biggest changes that need to be made are purely in your own head. Once you get the idea that EVERYTHING is expendable, it forces you to think about things in a different way. For example, servers might simply drop off the air for no reason … make sure you have good disaster recovery procedures. Amazon may have network outages … don’t rely on their API being 100% available, and certainly don’t automatically failover systems unless you can be sure if not thier problem first. Need to test something? Fire up a new server, test, and then destroy it!

    Our biggest issue was actually our database and making sure we always had a master server we could write data to. While we were developing a solution that would automatically failover to hot standby servers etc, Amazon came out with the latest incarnation of multi-zone RDS instances. These beasts are awesome from an admin point of view (we don’t have any dedicated DBAs in house). Point in time recovery up to 8 days (then whatever custom snapshotting you to outside of that), version control, patch management. Only annoyance is you can’t turn on into the slave of an existing MySQL server as part of your data migration.

    The real power behind the AWS systems is how all their systems are interlinked and leverage the features of each system to solve the tricky problems in the cloud.

    All we need now is a clustered file system/NAS/SAN that can be shared across multiple instances (EBS volumes are only attached to one server at a time)

    • Adam Bolte

      > All we need now is a clustered file system/NAS/SAN that can be shared
      > across multiple instances (EBS volumes are only attached to one
      > server at a time)
      Agreed. There are work-arounds such as glusterfs for this kind of thing, but it would be great if Amazon provided this kind of functionality as part of AWS. I wouldn’t be surprised if we see such a feature added in the near future.

  • http://www.wavepointmedia.com ramprage

    An interesting read however if you were to keep things organized from the start you wouldn’t have needed to move to EC2 in the first place.

    It sounds like you needed a decent system admin on your team instead you have multiple developers doing whatever they felt was right at the time while never documenting anything they did while never thinking about growth.

    IMO you guys would have been better off investing some energy into finding a good consultant, purchasing your own hardware and getting some colo space and create your own custom infrastructure. Today hardware is cheap, storage is cheap. You could have easily setup your own VM infrastructure (VMWare, Citrix are 2 great ones) with a SAN and had way more processing power than EC2 – probably for less. A much better ROI, and total control.

    I’m actually disappointed to see Sitepoint using EC2.

    • Adam Bolte

      Certainly if things were better organized it would have made migrating much easier, but we still needed to switch out the OS to something supported.

      The thing to understand is that some of this code is ancient. There was a lot more to the situation than just slapping Apache, mod_php and a DB on a new installation somewhere. There was mail, log and monitoring functions and other back-end functionality that customers won’t see to help ease management. It appears that a huge amount of code was written by people who probably left years ago – and probably in such early days when it wasn’t feasible to have a dedicated admin for the box.

      I’m very surprised to hear that you believe purchasing all our own hardware, upgrading and maintaining it, having it hosted, etc. could possibly be cheaper! It certainly wouldn’t be as flexible. EC2 instances start from $0.02 per hour – and you can turn them off when not needed.

      I don’t see how you have “total control” – or even more control for that matter – of your own hardware when you need to have it physically located some place with lots of bandwidth and a proper server environment. What happens if that physical location you cannot control is jeopardized? This happened in my city just this year:

      http://www.itnews.com.au/News/169054,datacom-data-centre-flooded-by-melbourne-storm.aspx

      I note that there is nothing specific about AWS that you have criticized… I get the impression you haven’t tried it.

  • http://www.wavepointmedia.com ramprage

    I’m not sure of your setup and requirements but I would have done it differently. It’s interesting to hear about moving to EC2. My final thoughts are if you ever need to move away from it, it won’t be easy. The cost per hour at the moment is low, as the service isn’t that old. However over time they can nickel and dime their customers and you have no control over that. Since you’re entire architecture changes to work with EC2 moving away becomes problematic and costs a lot of money.

    My comment about total control is just that. You can select low or high end hardware – you can select your bandwidth, you can select how many servers you need. With EC2 – you pay their rate – or you don’t. You have no control. If you usage is high – you pay more. With your own equipment your costs won’t change as long as your setup can handle it and you plan it to scale. All your data is also on US soil – which might be an issue for some.

    Datacenters aren’t perfect, I never said they were. I would just advise doing your due diligence when selecting one. Just remember that because you’re using a “cloud” like EC2 you’re not immune either. Trying googling EC2 downtime and you’ll see. Also be careful of DDoS attacks, bitbucket got into some trouble not long ago – http://searchcloudcomputing.techtarget.com/news/article/0,289142,sid201_gci1371090,00.html

    • http://www.wavepointmedia.com ramprage

      Meh just delete my last

    • Adam Bolte

      You do bring up good points though about vendor lock-in. If you have your own servers, in theory you can collect those from your hosting provider and move them elsewhere without too much trouble (but with some downtime). If Amazon said “We’re not going to support AWS any more” or they were brought out by another company that didn’t run it as well for whatever reason, we are basically stuck with them…

      Or are we? There is the Eucalyptus project that looks like it might ease the pain of migrating away. Looking into it is actually something I’m planning on doing when I have spare time… I’ve heard both good and bad things about it.

      There is also concern that the Amazon API is becoming a “standard” which is potentially dangerous since Amazon is in full control of it. In the event this starts to become a problem, there are alternative ways to manage instances such as the OpenStack project that is backed by RackSpace. In theory, this will increase competition and reduce the likelihood of being locked in.

      When you say “All your data is also on US soil”, you probably mean “looked after by an American company” – which may possibly be just as bad if you’re concerned about such things.

      • http://m2i3.com myrdhrin

        Well outside the APIs from Amazon what they’ve forced you to do is to properly design your system and separate the Operating System from the actual Data… and that, even if you moved to something hosted in-house would be to your benefit…

Related books & courses
Available now on SitePoint Premium

Preview for $1