By Lucas Chan

Client-side Load Balancing Web 2.0 Apps is Voodoo

By Lucas Chan

Digital Web recently published an article about “Client Side Load Balancing for Web 2.0 Applications“. I wanted to take a moment to explain why I think this load balancing technique is a bad idea. But first, here’s the concept in brief:

  1. Your web site is deployed in an identical fashion across a number of web servers.
  2. Your customer’s browser retrieves a list of web app servers from your server, say in XML format.
  3. The browser then “randomly selects servers to call until one responds”, and “has a preset timeout for each call. If the call takes greater than the preset time, the client randomly selects another server until it finds one that responds”.

No matter how long and hard I think about this concept, I can’t convince myself that it even sounds good in theory. Here’s why:

  1. We still have a single point of failure. What happens if our web application is not able to retrieve a valid list of servers?
  2. Correctly failing over is difficult and ungraceful. How much “preset time” should the client allow before trying another server? Is this waiting period acceptable to our customers? Can the application accurately tell the difference between a server being offline and plain old network congestion?
  3. Now that our client side code contains this load balancing logic we have a lot more testing to do to ensure it works in every browser on every platform.
  4. Load is distributed amongst the servers in a completely random way. The browser has no way of knowing that it’s just sent a request to a server that’s already busy.

Server-side hardware load balancing offers many advantages over this client-side method that are too great to ignore. Some other things you should consider are:

  1. A hardware load balancer is able to distribute requests to servers with the least load. They can also quickly detect an outage in your web cluster and direct traffic as appropriate. Why make our customer wait while their browser detects this outage for us – and then decides what action is appropriate? The quality of this detection application (developer) dependent and hardly guaranteed.
  2. Eliminate single points of failure. A redundant load balancer set up greatly reduces the chance of outage. There are reliable ways of doing this what won’t put a hole in your pocket.
  3. You can deploy updates to your site in a way that won’t confuse your customers; using the load balancer to hide servers from the customer until your site updates are deployed and tested.
  4. The load balancer can also act as a caching layer for static resources. This reduces the load on your web servers and delivers content to your customers faster!

Lastly, I think if you have gone to the trouble of building your application so that it runs across multiple nodes and you’ve signed the cheques for 3 or 4 (or more) servers, it doesn’t seem like much of a stretch to put at least one load balancer in front of the whole lot (2, if you can afford and manage it). Implementing client side load balancing after you’ve come this far seems like a blemish on all your hard work.

My next post will explore scaling and load balancing a little more. Until then… :)

  • The only time this would be a good idea, which is also stated in the mentioned article is when you don’t have a choice. Some people just don’t have access to physical infrastructure because they are outsourcing it to hosting companies and data centres like S3/EC2.

    I can see it being a useful option in that situation, but is naturally subject to the problems like you mention about network latency vs server downtime. But it is that, just an option. My preference would be for the hardware load balancer over this complication. Would be a nightmare to program, but could be used to direct traffic to the fastest response server. Hardware load balancing has the problem in that it only knows about the cabinet it is hosted in and doesn’t take client latency into account. You’d need to implement global load balancing for that to target the service based on geo-ip data,

  • BTW, you are both approaching it with different hats on. Yours is the systems administrator hat, his is the software developer. I could imagine it makes for a great sales pitch if you can sell your application with native load balancing without the need for additional hardware appliances.

  • Load balancing across EC2 instances is easier thank you think. ;)

    I still think that if you’re able to provision 3 or 4 web servers that adding a load balancer (even if it means reducing the number of web servers by one) is just a small progression from this.

    In fact, the intelligent way load balancers distribute traffic means you can probably reduce the size of your web app tier by at least 1 anyway. :)

  • I absolutely agree with you on the hardware side of things. Just saying that I can see how client side load balancing could be an option for some people. Strange how the two main arguments put forward by the author aren’t really that big a deal. Cost can be an issue, but as you mentioned, there are cheap ways to do it say through a two machine HA-Linux cluster, and the other was related to concurrent requests. If your load balancer cant cope with the level of requests, then there is a good chance a single app server is going to fall in a big heap too. No amount of client balancing will fix that.

  • There are things that client-side is good for, and things that server-side is good for. Load balancing most definitely falls into the latter category.

    I think it’s counter productive to think in terms of doing things on the client simply because you don’t have easy access to the resources for doing it on the server; if that’s the case, then the problem is missing resources. The client should not be used to make up for lack of server capability, the server capability should be upgraded.

  • Federico

    Load balancers add an extra layer of complexity to the system, so my advice is to spend some time planning and designing the architecture of the system. Consider using the database when working with sessions and web services, and another server as your main file repository.

  • Ren

    Heh, this reminds me of the SuperProxyScript thats been about since 1996.

    That used client side hashing of urls, to determine which proxy server to make a request to.

  • Hardware load balancers are cheap and effective.

    This sounds very much to me like a solution looking for a problem.

  • @Federico That may be true that they add an extra layer of complexity, but there comes a point when software design alone will not solve your stability problems. If you have a “very high” volume site, or require real time redundancy in your application/site, then load balancing is pretty much essential.

  • Lucas , is right in point out the client side Load balancing demerits. This ius simply true that forcing your client to wait multiple times on multiple servers is annoying one and the well established server side counterpart, i.e Server side Load balancing is the one which makes the path clear

  • leforge

    Hmmm… how would passing form data from page to page work with this? It seems like the article says that at anytime, the client-side scripting can push a browser from one server to another. Wouldn’t the servers need to keep track of various form values/properties and send that info to the other servers?

    Seems like it not only complicates the client-side work, but also the server-side as well. I don’t know… am I having a brain-fart on this or am I right?

  • leforge: You are right, this can be tricky. When storing form data in a user session variable you will run into problems if you’re using file-based sessions. The way to combat this is to make your PHP session handler store data in a database or on some sort of shared file system (same goes for file uploads). For what it’s worth this is an issue for both methods of load balancing.

  • leforge

    Lucas Chan: Good point about it being an issue for both methods. I’ve gotten so use to the server-side method having tools to handle all that for you that I forgot, it indeed, DOES need handling. I do a little PHP, but most of my work is done in .Net and there’s a simple setting that you have to flip and a database that you have to create that takes care it. All-in-all it takes just a few minutes to do. Hmmm.. if client-side loadbalancing catches on it, seems that it would be a good idea for M$ and the php people to come up with a simple (flip-a-switch) solution to this. Thanks for the response. I’ll have to crack open my PHP books and take a look. :).

Get the latest in Front-end, once a week, for free.