Why has tunetribe.com crashed due to a promotional give away demand?

Money Saving Expert - Martin Lewis teamed up with tunetribe.com to promote his new TuneChecker service to offer £5 of free music downloads from 1PM yesterday.

The site crashed and is still not on with proxy errors.

What went wrong here do you think? What can one do to make sure the server is able to handle a surge in demand? I know with a VPS you can just rent more capacity!

I know with a VPS you can just rent more capacity!

It’s not quite so easy, and there’s usually a limit to the size of the upgrade, at which point things get a lot more complicated.

Supposedly, the cloud hosting offers could be a solution, to scale up and down on a per need basis.

I agree to Dan. But cloud solution is not cheap one. If you are using shared web hosting now I believe dedicated server could be something what can work for you so far.

If the site crashed (as opposed to becoming unresponsive) then it’s a fairly common scenario (if it’s a single server operation) that it ran out of memory due to excess apache processes. Once you run out of memory processes will be killed by OOM and the parent apache process will get hit sooner or later (or mysqld) taking everything down.

It should be noted that a crashed server and an unresponsive one aren’t necessarily the same thing - a crashed one won’t come back online once traffic subsides, where as an unresponsive one due to no spare capacity will.

If a server is correctly configured, so that the MaxClients will not allow an out of memory situation or heavy swapping in conjunction with other processes e.g mysqld then the server will recover from overstress better.

Unfortunately many hosts don’t bother configuring httpd.conf and hence it has default settings that are not optimal to the particular server environment, and will allow more processes to be spawned than can be maintained.

There are a number of things that can be done to improve capacity given a set amount of server resources, but a lot depends on the backend architecture.

It’d be useful to analyse the server error logs to ascertain the exact cause of failure (might be the webserver, could also be the database)

A load balancer on the front end (e.g squid/haproxy/pound) provides a useful starting point for backend redundancy and expandability. Offloading static assets to a CDN reduces load on the webserver. If the database is the point of failure, checking for poorly performing queries and optimising cache sizes will help.

If they can replicate their site offline, then running apache ab/siege load tests over it will allow them to monitor, pinpoint likely points of failure and quantify effectiveness of measures taken to improve these.