Horizontal Scaling of PHP Apps, Part 1

This entry is part 1 of 2 in the series Horizontal Scaling of PHP Apps

Horizontal Scaling of PHP Apps

You’ve built a website. It was fun, and it feels rewarding to see all those visitors pour in. The traffic increases slowly, until one day, someone posts a link to your app to Reddit and Hacker News, the planets somehow align, GitHub is down or something and for some reason, people notice the post and storm in, breaking all barriers of reason and logic.

Your server chokes, and everything dies. Instead of getting new customers or regular visitors during this epic peak (epeak?), you’re now left with a blank page, scrambling about as you try to get it up and running again, to no avail – even after restarting, the server can do nothing differently to survive the load. You lost traffic – for good.

No one can anticipate these traffic spikes. Very few of us plan so far in advance, unless we’re setting up to build a highly funded project that’s expected to do very well in a fixed time frame. How, then, does one avoid these problems? Two aspects need to be considered: optimization and scaling.

Optimization

We’ve written about this before, and a more advanced article is coming up next week, but the usual advice applies – upgrade to the latest version of PHP (currently 5.5, has a built in OpCache), index your database, cache your static content (seldom changed pages like About, FAQ and similar), etc.

One particular aspect of optimization that can be done is not only caching static resources, but also serving anything static through a non-Apache server like Nginx, optimized for serving static content. You put a layer of Nginx in front of your Apache, tell it to intercept the requests for static resources (i.e. *.jpg, *.png, *.mp4, *.html…) and serve them directly instead of letting the request move on to the Apache application layer. Such a setup is called a reverse proxy (also sometimes identified with a software load balancer – see below), and you can find out more about implementing it here.

That said, there’s nothing like scaling.

Scale

There are two types of scaling – horizontal and vertical.

We say that a website is scalable when it can manage increases in traffic without needing software changes.

Vertical scaling

Imagine having a server serving the web app in question. The server has 4GB of RAM, an i5 CPU, and a 1TB HDD. It performs its function well, but to better tolerate a higher influx of traffic, you decide to replace the 4GB of RAM with 16GB, you put in an i7 CPU, and you add a PCIe SSD/HDD Hybrid drive. The server is now much more powerful and can handle a higher load. This is known as vertical scaling, or “scaling up” – you improved the machine to make it more powerful. In other words, this happens:

Horizontal scaling

On the other side of the spectrum, we have horizontal scaling. In the example above, the upgrade itself will likely cost as much as, if not more than, the starting machine on its own. This is costly, and often doesn’t produce the benefits we need – most of the scaling problems are related to concurrency, and if there aren’t enough cores to perform the logic fast enough, no matter how strong the CPU, the server will grind to a halt and force some visitors to wait.

Horizontal scaling is when you build a cluster of (often weaker) machines linked together to serve the website. In this case, a load balancer is used – a machine or program whose only role is determining to which machine it should send the request it intercepted. The machines in the cluster then automatically divide the workflow among themselves without even being aware of one another, and your site’s traffic capacity increases immeasurably. This is also known as “scaling out”.

There are two main types of load balancers – hardware and software. Software load balancers are installed on a regular machine and accept all traffic, routing it to the appropriate handler. Nginx can be one such load balancer in the case above under “Optimization” – it intercepts requests for static files, and serves them on its own, without burdening Apache with them. Another popular software load balancer is Squid, one I’ve personally used in my company extensively and one which provides truly deep control of all aspects via a user friendly interface.

Hardware balancers are dedicated machines with the sole purpose of being load balancers – no other software is usually installed on them. Some of the most popular ones designed for handling immense amounts of traffic can be read about in this list.

In horizontal scaling, this happens:

Note that the two are not mutually exclusive – you can scale up a machine (also called a node) in a scaled out system, too. In this article, we’ll be focusing on HZ scaling due to it generally being the better (both cheaper and more efficient) choice, albeit more difficult to implement.

Challenges with data sharing

There are several tricky issues to overcome when scaling PHP applications. One such issue is database bottlenecking (something we’ll cover in Part 2) and another is managing session data – surely if you log in on one machine, you’ll be logged out if the load balancer redirects you to another machine in your next request, right? There’s a way around this – you can either share the local data between machines, or you can use a persistent load balancer.

Persistent load balancer

A persistent load balancer remembers where it previously redirected the client, and does the same thing on his next request. So if I visit SitePoint and log in, the load balancer redirects me to, say, Server1, remembers me, and my next click after logging in will also be redirected to Server1. Naturally, this all happens transparently. What if Server1 goes down, though? Yes, all session data is lost – I’m logged out, and I need to start over on another server. This is a needless interruption of user experience. What’s more, the load balancer has so much to do now (not only redirecting hundreds of thousands of people to various servers but also remembering where it sent each of them), it has become the bottleneck and might benefit from some scaling out on its own. But if one LB then crashes, is the remembered data about clients and the servers they were sent to lost as well? WHO WATCHES THE WATCHMEN? The situation reeks of an obvious catch 22.

Sharing local data

Sharing the session data across the entire cluster definitely seems like the go-to approach, and while it does require some architecture changes in your app, it’s well worth it – there is no bottleneck, and the entire cluster is fault tolerant – one server’s demise is completely irrelevant to the rest and isn’t even noticed (by the machines, of course – the humans in charge of them hastily replace the hardware as soon as the fault occurs).

Now, we know session data is stored in the $_SESSION superglobal in PHP, and we know that the $_SESSION superglobal reads and writes from and to a file on disk. If said disk is in one server, though, it’s obvious that other servers have no access to it. How, then, do we make it available across several machines?

First, note that session handlers can be overridden in PHP – you can define your own class/function to handle session management. For more information on how that’s done, please see the docs.

Using a database

Using a custom session handler, we can make sure the session data is always stored in a database. The database should be on a separate server (or cluster of its own!), so the servers being load balanced from the original story are serving just the business logic. While this approach often works well, on truly high traffic incidents the database becomes not only the single point of failure (lose that, and you lost everything), it also causes a significant connection overhead due to having to connect to the various servers writing session data to it at all times. It becomes the new bottleneck, and could use some scaling out, which is another problem when using traditional databases like MySQL, Postgre and similar (covered in Part 2).

Using a shared file system

You might be tempted to set up a network file system to which all servers can write their session data. Don’t. This is the absolute worst approach, prone to corruption and data drops and is extremely slow. It’s also a single point of failure, much like the database aspect above. Activating it is as simple as changing the session.save_path value in php.ini, but it’s highly recommended you use a different approach. If you really insist on using a shared file system, it’s much better to use a solution like GlusterFS.

Memcached

You can use memcached to store session data in RAM. This is arguably unsafe because data in memcached gets overwritten as space runs out, and there’s no persistence – remembering someone’s login will only last for as long as the memcached server is running or has room to remember it. You might be wondering – but isn’t RAM separate on each machine? How does that apply to a cluster? Memcached has the ability to virtually pool the available RAM from multiple machines into one large whole.

Courtesy of memcached.org

The more machines you have, the bigger the pool gets as you dedicate extra RAM to it. You don’t have to give a machine’s RAM to the pool, but you can, and you can give arbitrary amounts from each. So a good chunk of RAM remains on the machine for regular use, while the rest is donated to cache, helping you not only store session data cluster-wide, but also allowing you to cache other content as you see fit – as long as there’s room. Memcached is a great solution, and this approach has widespread adoption.

Usage in PHP apps is as simple as changing some php.ini values:

session.save_handler = memcache
session.save_path = "tcp://path.to.memcached.server:port"

Redis Cluster

Redis is an in-memory NoSQL data store, much like Memcached, but supports persistence and more complex data types than just string based key => value pairs. It doesn’t have cluster support yet, though, so implementing it in HZ scaling solution is not as straightforward as one might think, but it’s getting there. In fact, an alpha version of their clustering solution is already out and can be used: http://redis.io/topics/cluster-tutorial. For a more in depth comparison between Memcached and Redis, see this StackOverflow answer. Compared to a typical caching solution like Memcached, Redis is more like a Memcached-turned-proper-database.

Other solutions

  • ZSCM from Zend is an alternative, but requires Zend Server on every node in the cluster.
  • Other NoSQL stores and caching systems would work – try solutions like Scache, Cassandra or Couchbase, all blazingly fast and reliable.

Conclusion

As you can see, horizontally scaling PHP web apps is no picnic. There are various hurdles to overcome, and the solutions are not easily interchangeable – more often than not it’s all about picking one and sticking with it for better or worse, because by the time the traffic rolls in, it’s too late to make a smooth transition to something else.

I hope this short guide helped you decide on the best approach for your company, and if you’ve got alternative solutions or suggestions, we’d love to hear about them in the comments below. In Part 2, we’ll cover database scaling.

Horizontal Scaling of PHP Apps

Horizontal Scaling of PHP Apps, Part 2 >>

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Peter Nijssen

    Very nice article. I believe that within a year, I am working on horizontal scaling, so nice already to read some things about it. Looking forward to part 2

    • http://www.bitfalls.com/ Bruno Skvorc

      Cheers, will be up next Saturday!

  • http://brunopaz.net/ Bruno Paz

    Very nice article. Looking foward for part2.

    • http://www.bitfalls.com/ Bruno Skvorc

      Thanks, coming soon!

  • http://www.bitfalls.com/ Bruno Skvorc

    There is no _best_ way, it depends on what’s most approachable for you. The most popular, right now, that I know of is memcached, but if I was implementing a HZ scaled app from scratch _today_, I would go with the alpha version of the Redis Cluster – I like playing around with cutting edge technologies and don’t mind some risks.

  • Adnane

    Tnx fir the article very interesting, would it be efficient if we use memcached or redis on the LB to make a node crache irrelevant and store session information but also use it on the web cluster to not lose the sessions and information on that side too ?

    • http://www.bitfalls.com/ Bruno Skvorc

      The less your LB has to do, the more efficient it is. As such, it’s better to have it just do the balancing, no remembering anything.

      • Adnane

        Thank you, Looking forward for part2.

  • WooDzu

    I’ve written quite a long comment for this article but apparently something went wrong. Is it possible that it’s still waiting for approval or should I try posting it again?

    • http://www.bitfalls.com/ Bruno Skvorc

      Nothing in the queue, sorry :(

    • OphelieLechat

      It went in our spam queue for some reason. I’ve rescued it and it’s now below.

  • http://www.bitfalls.com/ Bruno Skvorc

    Excellent feedback, thank you very much, nice to hear another perspective on the NFS approach! Agreed that load testing is a very important topic in this case, but it probably won’t fit into this series. Maybe if there’s enough interest, I could cover it in part 3, we’ll see.

  • Jamie Devine

    I’ve personally never worked on an app or website that required more than one server, I’ve always wondered how this sort of thing worked for larger apps, this is a nice insight. I’d love to know how a massive website like Facebook (which I assume has thousands of servers) deals with this sort of thing, it’s a whole different world from the type of sites I work on.

    • http://www.bitfalls.com/ Bruno Skvorc

      They do the exact same things as mentioned here, more or less, but their servers span thousands, yes. For more information on how a site like Facebook manages its architecture, you should follow their development blog (https://developers.facebook.com/blog/) and their Facebook engineering page (https://www.facebook.com/Engineering), there’s some mad info there.

      • Jamie Devine

        Nice! Thanks for the links, I’ll check those out.

        This is probably a silly question but it just occurred to me – would every server in the cluster have a duplicate of the website’s codebase on it?

        Anyway, looking forwards to the article on databases!

        • http://www.bitfalls.com/ Bruno Skvorc

          Depends on the setup but generally, yes, every server in the application layer would have a copy of the application. Deploying a code update to a huge cluster has challenges of its own, but when you’re dealing with under 50 servers it’s usually near-instant if set up properly and only takes a push to the production branch to be pulled automatically into all instances.

  • Szabolcs Zajdó

    Perfect summerizing article! I would have mentioned Riak as an alternative to Memcached and Redis, which supports persistence and already has cluster support.

    • http://www.bitfalls.com/ Bruno Skvorc

      Excellent pointer, thanks!

  • frostymarvelous

    Bruno, for a site that needs this kind of scale, it must be heavily dynamic right? Each server is already connecting to the database on almost (except when cached) page load. Doesn’t this make the session issue negligible?

    • http://www.bitfalls.com/ Bruno Skvorc

      Not really, when it’s a separate connection that needs to be made. It would be fallacy to retrieve session data with every database connection (at that scale, all that extra session data would quickly accumulate to some serious bandwidth), and two calls to the database instead of one quickly become a problem when dealing with several million visitors per day.

      • frostymarvelous

        This is actually really good info. I just started some AWS work and scaling is high on my list of priorities to implement. I have actually been looking for a great one on session management. Thanks.

        • http://www.bitfalls.com/ Bruno Skvorc

          My pleasure!

  • denis sorn

    Bravo Bruno, and thanks. Nice article, easy to read (Somehow I always find PHP articles about stuff like design paterns and similar easier to read/understand, then say the Java ones.).
    I appreciate different possibilities you mentioned. The only one for session sharing I already knew, is Zend server. Since I only wanted to play with this, for learning purposes, Zend server was clearly not an option. So I was thinking about how to implement session sharing in php. That was some years ago :-). In the last time I am busy with other things, but will try to find some time to mess again with php and Zend framework. I’ll probably try to port my application to Zend 2 framework, and to add support for session sharing.

    • http://www.bitfalls.com/ Bruno Skvorc

      Thank you, glad you liked it!