Horizontal Scaling of PHP Apps, Part 2

In part 1, we discussed the horizontal scaling of the application layer – scaling out servers to handle the PHP code concurrently via load balancers and other means. We covered sharing of local data, touched on potential optimization, and explained the basics of load balancers.

The application layer, however, isn’t the only thing that needs scaling. With a bigger demand for our app, the demand for higher read/write operations on the database also surfaces. In this part, we’ll look at scaling the database, explain replication, and cover some common pitfalls you might want to avoid.

Key Takeaways

The horizontal scaling of PHP apps not only involves scaling out servers to handle the PHP code concurrently but also requires scaling the database to meet higher read/write operations demand.
Optimization strategies such as proper database indexing, keeping tables minimal and focused, performing small atomic queries, and utilizing query cache can help speed up databases and avoid bottlenecks.
Master-Slave Replication (MSR) is a common feature in modern databases where all database write operations from one server (the master) are sent to one or more slaves, which then replicate the master’s data. This allows the division of labor between machines, freeing the master to do only writes, and the slaves for fetching, thus preventing overburdening any particular server.
A potential issue in this setup is the sync delay between masters and slaves, which can pose a data consistency threat. However, this can be managed with workarounds, and in most cases, data that was written doesn’t need to be immediately read.
In the event of a master failure, solutions exist for master failovers, usually by turning a slave into a master. However, this requires additional architecture changes and may result in a small data loss. When the master is brought back online, it should be set up from scratch as a slave to the new master.

Optimization

As in Part 1, we need to mention optimization first. Indexing your database properly, making sure the tables consist of the least amount of important data and keeping the secondary information in others (users_basic + users_additional + users_alt_contacts, etc – known as database sharding – a complex topic warranting its own article for sure), doing small, atomic queries as opposed to large on-the-fly calculations – all those methods can help you speed up your databases and avoid bottlenecks. There’s one aspect that can help even more, though – query cache.

Database servers typically cache results and compiled SELECT queries that were last executed. This allows a client (our app) to receive the data from cache instead of having to wait for execution again. Such an approach saves CPU cycles, produces results faster, and frees the servers from doing unnecessary calculations. But the query cache is limited in size, and some data sets change more often than others. Using the cache on all of our data would be ludicrous, especially if some information changes faster than some other information. While the cache can be fine tuned, depending on the database vendor, there’s an approach that lends itself quite nicely to the query cache solution – contextual grouping of servers.

When you group servers, you have groups of servers each dedicated to a specific part of your app. Say I’m making an online gambling website – I could have one group for in-game chats, one group for ongoing games, one group for user accounts. The user accounts group doesn’t take that many reads or writes, at least not as many as the chats or games functions do, so we can make it smaller – have 5 servers there. In the chats group, we can have 10 servers, because the real time communication of the millions of our users is very important for a high quality gambling experience. But there’s no aspect as important as the actual games – our bread and butter – so those get an exceptionally large group, maybe 20 servers.

This makes sure you do all your user related reads and writes in one group, not touching the others. The games need to be fast, so the largest group is used to make sure they’re served fast (both read and write operations), while the user account page isn’t all that important to people who are already logged in and preparing to change things – we’ve already captured their attention and don’t need to impress them with super-fast responsiveness on the non-business-critical parts of our app.

All groups share the same total data (see MSR below), so you can move servers from one group to another – say there’s a huge poker championship and you’ve got millions of users watching the game – the game group could benefit from 4-5 servers taken from the chat group to make sure it can handle the load. That said, there are built in solutions for clustered databases, and one such solution is Master Slave Replication.

Master-Slave Replication

Master-Slave replication (MSR) is a very common feature in modern databases, often built-in. Search the documentation of your particular database vendor for information on whether or not it’s natively supported. MSR is the process of sending all database write operations from one server (the master) to one or more slaves. The slaves then replay the queries, and thus replicate the master’s data. Integrated with a web app, this is how it works, step by step:

a visitor performs a write operation on a database. For example, he changes some profile information.
the application sends the query to the master database server, as usual
the master executes this query and forwards it to all the slaves
the slaves execute the query, and the same data is now in both master and slave machines
further read operations are performed on the slaves

The final step is what’s important here – we perform read operations on the slaves. This by itself is division of labor between machines – with the Master free to do only writes (generally, there are far fewer writer than reads in a web app, and as such one Master is often enough), and an army of slaves available for fetching, no particular server is overburdened.

MSR is usually activated by default in today’s installations of MariaDB and MySQL.

Separating reads and writes

This is where a slight modification to your architecture is required, its complexity being reciprocal to the quality of your code and use of a framework or a good project structure. When needing to separate reads and writes, you’ll need to make separate connections. For writing, you need to connect to a “write DB” (the master), and for reads, you’ll need to connect to one of the slaves. This can, of course, be done manually. For writing, you would connect to a specifically defined master configuration:

<?php
$config = $this->getService('configProvider');

$master = new PDO(
    'mysql:dbname='.$config->db->master->name.';host='.$config->db->master->host, 
    $config->db->master->user, 
    $config->db->master->password
);
$statement = $master->prepare('SOME QUERY FOR UPDATING');
//...

While for reading, a new connection would be required, through a different set of config values:

<?php
$config = $this->getService('configProvider');

$slave = new PDO(
    'mysql:dbname='.$config->db->slave->name.';host='.$config->db->slave->host, 
    $config->db->slave->user, 
    $config->db->slave->password
);
$results = $slave->query('SOME QUERY FOR READING');
//...

If we have multiple slaves, and we know their host addresses, we can do some randomization:

<?php
$config = $this->getService('configProvider');

// Pick random slave from list of possible slaves
$slaveConfig = $config->db->slaves[array_rand($config->db->slaves)];

$slave = new PDO(
    'mysql:dbname='.$slaveConfig->name.';host='.$slaveConfig->host, 
    $slaveConfig->user, 
    $slaveConfig->password
);
$results = $slave->query('SOME QUERY FOR READING');
//...

Realizing this is awkward to do every time you need to do reads or writes doesn’t take long. A better solution would be to pre-init a master DB connection in a service container of some sort, and then just access that one for writes, without the need to re-establish connection every time you need writes. As for slaves, the same approach can apply – build the randomization function into a “Slave” DB adapter, and just call something like $this->getService('db')->slave, which automatically returns a randomly selected one. This way, you never have to worry about manually selecting them ever again, and as you add more slaves to your cluster, just include their host address into the configuration file.

Want to take it up another notch? Have the slave-fetching service select only those slaves that aren’t taxed or down at the moment – make the slaves report their CPU/RAM usage somewhere repeatedly, and make sure the slave fetching class always selects the one that’s under the least amount of fire. This abstract selector class should also make sure it connects to another slave if the one it originally tried to connect to is down – you need the database reads to remain transparent to the end user, but you still want the admins to know there’s a problem with one of the slaves:

<?php

// ... some class, some connect() method

$config = $this->getService('configProvider');
$mailer = $this->getService('mailer');
$validSlaves = $config->db->slaves;
$bConnectionSuccessful = false;

while (!empty($validSlaves) && !$bConnectionSuccessful) {
    $randomSlaveKey = array_rand($validSlaves);
    $randomSlave = $validSlaves[$randomSlaveKey];
    try {
        $slave = new PDO(
            'mysql:dbname='.$randomSlave->name.';host='.$randomSlave->host, 
            $randomSlave->user, 
            $randomSlave->password
        );
        $bConnectionSuccessful = true;
    } catch (\PDOException $e) {
        unset($validSlaves[$randomSlaveKey]);
        $mailer->reportError( ... ); // Some kind of email sent to the admins, or the master log, etc
    } catch (\Exception $e) {
        unset($validSlaves[$randomSlaveKey]);
        $mailer->reportError( ... ); // Some kind of different email sent to the admins, or the master log, etc
    }
}

if ($bConnectionSuccessful) {
    return $slave;
} 

$mailer->reportError( ... ); // Report that the entire attempt to connect to any slave failed
throw new \Exception( ... ); // or report via an exception with details in the message

This is only pseudocode, but you get the gist – try to connect to a randomly selected slave, if it fails, remove that slave from the array of valid slaves, report the problem, and try again on another slave for as long as there are more to try.

Read/write sync delay

One thing that might pose a data consistency threat is the sync delay between masters and slaves. On smaller writes, the replication to slaves will be near instant, and bigger queries will, naturally, need longer to execute – this delay also increases as the slave army grows, because the master has more query copies to disperse – some slaves will be ready sooner than others. However, no matter the quality of your cluster’s network, the speed of the individual machines, or the size of the queries, there is no setup in the world able to accurately do the following:

<?php

$db = $this->getService('db');
$master = $db->get('master');
$slave = $db->get('slave');

// value is currently 1
$master->exec('UPDATE `some_table` SET `value` += 1 WHERE `id` = 1'); // executes
$slave->query('SELECT `value` FROM `some_table` WHERE `id` = 1'); // value is still 1

These statements are executed too close to each other, and there is simply no way for the slave to get the update in time to accurately reflect it.

There are various workarounds for this, though none are failproof. This inconsistency is something you just have to accept. In most cases, data that was written doesn’t need to be immediately read. If it does, it’s best to just take an approximation – if you expect the value to increase by 1, then just read the original value from the slave, and add 1 to the result, then display it to the user. The master can still propagate the actual update in the background.

Master fail

What if a master fails, though? Does the whole system grind to a halt? Well, it shouldn’t. Solutions exist for master failovers, and they usually solve the problem by turning a slave into a master if the master fails. Additional architecture changes are needed, however, to make this possible. The procedure is as follows:

A master fails and can no longer accept writes
Our master fetching script recognizes this, reports an error, and selects a random available slave
This slave is given the signal to become the master
All other slaves are given the signal to switch to the new master
A small data loss is possible, if the master died mid-write and didn’t have time to propagate the data change to all slaves, or if the slave didn’t have time to execute the query before being turned into a master
When the master is brought back online, it should be destroyed and set up from scratch as the slave to the new master – doing anything else would be pointless, because catching up to the missed queries from during the downtime would be a fool’s errand. If the failed master machine is the most powerful machine and you really want it to continue being the master, additional safety steps need to be taken which will probably warrant a short downtime until the re-booted master catches up to everything.

In MySQL and MariaDB in particular, you can make a slave switch its master with the CHANGE MASTER TO command, while promoting from slave to master is done via STOP SLAVE and RESET MASTER. For more information on master switching, please see the docs.

Conclusion

In Part 2, we covered database replication and clustering of databases. With both these parts now behind you, hopefully you’ve gathered enough initial knowledge to proceed on your own and build an excellent scalable environment. Do you have any suggestions you think are crucial in horizontal scaling? Would you like to see a more complex part 3? Have we missed some critical aspects? Let us know in the comments below!

Frequently Asked Questions on Horizontal Scaling of PHP Apps

What is the difference between horizontal and vertical scaling?

Horizontal and vertical scaling are two different approaches to handle increased load on a system. Vertical scaling, also known as scaling up, involves adding more resources such as CPU, RAM, or storage to an existing server. On the other hand, horizontal scaling, also known as scaling out, involves adding more servers to distribute the load evenly. While vertical scaling has limitations due to the finite capacity of a single server, horizontal scaling can theoretically support infinite growth as long as more servers can be added.

How does horizontal scaling improve the performance of PHP applications?

Horizontal scaling improves the performance of PHP applications by distributing the load across multiple servers. This reduces the strain on any single server and prevents any single point of failure. It also allows for better handling of traffic spikes and improves the overall reliability and availability of the application.

What are the challenges in implementing horizontal scaling for PHP applications?

Implementing horizontal scaling for PHP applications can be challenging due to issues such as data consistency, session management, and load balancing. Data consistency can be a problem when multiple servers are updating the same data. Session management can be tricky when a user’s requests are handled by different servers. Load balancing requires a strategy to distribute the load evenly across servers.

What strategies can be used for effective horizontal scaling of PHP applications?

Effective horizontal scaling of PHP applications can be achieved through strategies such as load balancing, session clustering, and database sharding. Load balancing distributes the load evenly across servers. Session clustering allows sessions to be shared across servers. Database sharding splits a database into smaller parts that can be distributed across servers.

How does horizontal scaling affect the cost of running PHP applications?

While horizontal scaling can improve the performance and reliability of PHP applications, it can also increase the cost. This is because more servers mean more infrastructure costs. However, the cost can be offset by the benefits of improved performance and reliability, especially for high-traffic applications.

How can I monitor the performance of my horizontally scaled PHP application?

Monitoring the performance of a horizontally scaled PHP application can be done using various tools and techniques. These include server monitoring tools, application performance monitoring tools, and log analysis tools. These tools can provide insights into server load, response times, error rates, and other key performance indicators.

Can I use cloud services for horizontal scaling of PHP applications?

Yes, cloud services can be used for horizontal scaling of PHP applications. Many cloud service providers offer auto-scaling features that automatically add or remove servers based on the load. This can be a cost-effective and flexible solution for scaling PHP applications.

What is the role of a load balancer in horizontal scaling?

A load balancer plays a crucial role in horizontal scaling by distributing the load evenly across servers. It ensures that no single server is overwhelmed with requests, which can improve the performance and reliability of the application.

How can I handle session management in a horizontally scaled PHP application?

Session management in a horizontally scaled PHP application can be handled using techniques such as session clustering or sticky sessions. Session clustering allows sessions to be shared across servers, while sticky sessions ensure that a user’s requests are always handled by the same server.

What is database sharding and how can it help in horizontal scaling?

Database sharding is a technique where a database is split into smaller parts, or shards, that can be distributed across servers. This can help in horizontal scaling by reducing the load on any single server and improving the performance of the application.