Introduction to Affordable, Scalable Websites
One morning you wake up, grab some coffee, and check the how many people visited your site yesterday. (C’mon, admit it! I know I do it every day!) You look at the numbers and they just don’t look right. Maybe your coffee isn’t strong enough — there has to be a decimal point in there somewhere! They’re all too big! You received ten thousand new visitors yesterday …and it’s climbing rapidly. You’re finally making it big!
Next, you check your email, and find a note from your ISP telling you that you need to either upgrade to a more expensive plan, or pay a fortune for bandwidth. Oh no! What to do? Well, if you designed your site properly, you can smile with the knowledge that you have everything under control.
You’ve reached the point where you need multiple servers. It’s time to figure out how to keep your site up and running without breaking the bank. It’s important to note that this article is also relevant to you even if you don’t have a dedicated server — it’s often beneficial to have several smaller accounts than one large one.
This article will first discuss the example of a budget scalable Website, then briefly talk about the proper way to design a site of your own. Unfortunately, this method can be prohibitively expensive for small Webmasters, but it is important to know. Next, we’ll discuss how you can design a Website so that it will be easy to scale in the future, and won’t force you to radically revamp your current way of working. Finally, I’ll talk about how to convert an existing Website into a scalable site, and we’ll explore some of the technical hurdles that need to be overcome.
An Introduction to Scalable Servers
Why does one need to specifically design a site to be scalable? Well, let me give you an example.
I’m sure you’ve heard of the ever-popular “Am I Hot or Not” Website? Two college students had some free time and one day they came up with a great idea. They thought, wouldn’t it be cool to make a site where visitors can upload their own picture and have other people vote if they’re hot or not? It’s an addictive combination and it’s no surprise that virtually overnight, the site went from receiving no hits to several million an hour. They were suddenly faced with what had become the challenge to keep their site running.
They quickly made a key observation: every page view is independent of every other. So, if you’re viewing one page of the site, it makes no difference from which server the next page is served. This is because there are only two parts to the Website: view pages, and recording votes. This situation greatly simplified matters for these particular Webmasters. They simply used one main SQL database, and had other servers calling it for each page view. This way, they could distribute the load without having to significantly change the way their program worked.
So they upgraded to seven servers in seven days and, not surprisingly, their database server started choking. They noticed that almost every page simply makes a SELECT query from a database, and each vote was written right back to the database. However, they decided there was no real reason for votes to be recorded right away – after all, who would notice if the number of votes displayed was a few hours old? So, at first they simply cached the vote data and updated it periodically.
Then, after a few more days, their server started choking again, and they upgraded to running multiple database servers. From that point, they were able to scale to any size they needed, eventually reaching more than 17 servers! Now let’s see how you can do the same, on an even smaller budget.
The Ideal Scalable Website Architecture
Building a scalable Website can be a very tricky proposition. It involves a lot of redundancy, load balancing, multiple Webservers, a separate database server, and backup servers. For example, in the figure below, all connections to the Internet first go through a load balancer.
A load balancer will distribute traffic according to a procedure that you, as Webmaster, set up. For example, it may send most of the traffic to the server with the lowest load average. This helps keep response times nice and low, even if your site is linked to from CNN.com. Multiple servers are used to help keep the system fault-tolerant. If one server crashes, the traffic can be diverted to the other servers. The same applies to the database – there is a replicated backup in case the main database goes offline.
As you’ve probably guessed by now, such a setup can be very expensive. Nonetheless, I urge you to look into such a system if you have the time, money, and ambition. The exact details of a complete architecture are beyond the scope of this article, but there are numerous books on this topic.
Fortunately for the rest of us, there’s a higher-risk, but cheaper and easier method.
Budget Scalable Architecture
For those of us who can barely afford a single server, much less seven, there is an incremental approach you can follow. Simply design your site so it can run on one or more servers. If you’re doing well, why not rent a new server, set up your software on this server, and tell your old servers about the new one. While this is not as easy as it sounds, it’s quite doable. Simply keep a few key ideas in mind when you design your site, and should it ever dramatically grow in popularity, you won’t even break a sweat (unless it’s from a victory dance).
As we’re on a budget, there are a few constraints to keep in mind. We can’t afford to run servers from our office — instead we’re going to use the many inexpensive dedicated Web hosts out there. We want to be able to add or remove servers at will, depending on traffic, and we want to minimize bandwidth usage to save costs.
Unfortunately, most budget dedicated Internet hosts are not very helpful when it comes to setting up the kind of tiered server discussed above — they will charge you a lot of money to connect several servers together. So, we want to avoid having a separate database server if at all possible. Please note that if only occasional lookups or writes are made to the database, it’s perfectly reasonable to send the requests over the Internet – this scenario won’t use up a lot of bandwidth.
Master and Slave Sites
First off, there are many different ways to design a scalable site. The easiest, though not necessarily the best, is simply to create a master site, and several slave sites: one main site from which all the others are replicated. Everything is copied – the database, HTML pages, everything (though you may want to use one database and have all the servers link to it in order to maintain consistency).
For example, on sitepoint.com it does not matter from which server an article is served. You may view page one of this article from a server in Sydney, and the second from a server in Detroit, and you’d never know the difference – nor does it particularly matter. However, they have a nifty little voting mechanism at the bottom of the article (you want to press the button below the number ten on this article to see what happens!). Clicking this button could hit the master server instead of a slave and update the database appropriately. Then, if you use multiple databases, once a day (or instantly if you use database replication) the data will be copied to each of the slave servers.
One Server Per Visit
A variation of this method is to use multiple servers, but have the user view pages served from the same server during their entire visit. You could have www.mysite.com, www2.mysite.com, www3.mysite.com, and so on. When a user visits your site, they are randomly sent to one server (or, if you store the user’s data on a particular server, you’d always send them to that particular server).
This is very simple, and requires virtually no changes to your site. It works well with persistent data and is easy to understand and implement. But it has a very bad Achilles heel: what if everyone starts linking to a page on the www3 server? Or what if www3 crashes? It could be floored or down, leaving the other servers sitting idle. Then again, this method is extraordinarily simple, very effective, and works well on a budget.
Serve Content By Type
Another method is to serve one type of content from one server, and a different type from another server. A good example of this might be to put graphics on one server, and dynamic content on another. This allows you to use a slow server and a high bandwidth provider for the graphics, and a higher quality but more expensive server for the content. This works well when you have outgrown the bandwidth allocation of your regular Web hosting account. The hitch? If one server goes down, your entire Website would be, for all practical matters, dead in the water.
Caching is also an excellent way to scale Websites. Instead of having the user directly view your main site, they view a cached copy on a different server. In this situation, instead of serving dynamic data, you’re serving static data. One or more main servers generate all the pages and push them out to the other server/s, from which users view your site (for a more automatic and advanced method of doing this, look into using the excellent open-source Squid caching program).
This solution requires only one or two expensive servers, along with several inexpensive auxiliary servers. This technique is simple and effective, though it’s not suitable for sites that are customized for each user.
The Site Design
So how does all this help you design your site? Well, some of the lessons learned by exploring each of these alternatives include:
- Keep your site segmented into sections. For example, keep static content in one directory, and dynamic content in another, to allow easy splitting into multiple servers later.
- Try to allow for pages to only use SELECT SQL queries, or session-less pages, to allow for round-robin type Web serving.
- Never hardcode links to other pages on your site. This way, you can easily update your site if you change its location. Personally, I keep all my internal links in one file and then include them (i.e.
<a href="$global_mysiteURL/$home_url">Click here</a>)
- Use caching whenever possible. It’ll reduce the load on your server and help balance things.
- Try to avoid server side sessions, but if you need them, be sure to set them up so that either the data can be shared with the other servers, or the user uses only one server.
One last note: a useful tool for synchronizing your Website between multiple servers is rsync, or, even better, cvs. If you make a change on the master server, simply commit your changes on that server, then run cvs update on the others. All customizations made on a slave server will be automatically kept intact by cvs.
Converting a Site
“Okay” you say, “I already have my site …and I haven’t followed any of your suggestions! What should I do?” Don’t worry, there’s still a lot you can do to convert your site to run on multiple servers.
First, let’s get the easy cases out of the way. If your site is completely made of static HTML files, then you’re set. All you need to do is copy the Website to two different servers and setup a round robin DNS. If your site is almost completely static but has a few dynamic pages, simply put all the static material on one server and the dynamic material on another.
Now the hard cases: if you have a complicated user-input driven Website, some more work is necessary on your part. First, analyze your site. In particular, look at all the SQL code. Can you divide your site into distinct pieces — such as parts which could use slightly out of date data (or read-only), and parts that write to the database? If so, you can simply setup master and slaver servers. Put the pages that write on one server, and the pages that read on another. This way, you can balance the load and reduce database traffic.
If none of the above suggestions works, look into setting up multiple distinct servers and send visitors to the server from which they signed up. In my opinion this is the easiest, though not necessarily the best way to go.
If none of the above options is successful, running several Web servers and a database server is your best bet. This will allow your Website to run virtually unmodified. Just be sure to store sessions in the database, and not in files, to allow them to be accessed from any server. Be sure to talk to your ISP to get a direct connection between the servers (or to obtain your own rack in a co-location facility) to avoid racking up huge bandwidth charges.
When your site suddenly starts growing like a beanstalk, it is both a joyous and worrisome event. If you have a big budget, no worries, but even with a minuscule budget you can keep your site running. If you start thinking about how to design your site to scale before problems start to crop up, you can prevent any downtime from occurring. In fact, there’s very little that needs to be done to prepare a site to scale — with a properly designed site, you can simply drop in your new servers and be up and running in no time. Good luck, and here’s to your site!
- ArsDigital Systems Journal: Building a Scalable eBusiness Solution — This is a good overview of how to build a proper scalable site.
- CVS tutorial– You should most definitely use CVS whenever possible. It’s your friend!
- rsync homepage
- Read more about the “Am I Hot or Not” story
- Free round robin DNS from mydomain.com
- A great description of round robin Web servers
- mysql database replication. Automatically keep many slave databases the same as a master database.