This article was sponsored by Monitis. Thank you for supporting the sponsors who make SitePoint possible.
For better or for worse, our jobs as developers don’t end with that last line of code, the final commit or by hitting the “deploy” button.
Even the best-engineered web application isn’t bullet-proof, the most expensive hosting environments still aren’t one hundred percent reliable; and, ultimately, there’s always something that can go wrong.
We can plan for failure, have processes in place in the event of a problem arising and even contingencies for a genuine disaster, but in between we can monitor.
Monitoring allows us to be reactive; to take action in the event of a problem, as well as proactive; taking preventative action before an issue arises.
In this article we’re going to take a look at monitoring, in the context of a website or a web application. Along the way we’ll be taking a detailed look at Monitis, an all-in-one monitoring platform that is one of the leaders in its field, and how it can help make sure that once you’ve launched your app, it remains running and keeps performing.
What can go wrong?
In order to know how and what we should monitor, it helps to have an understanding of what could potentially go wrong. The short answer is probably “well, a lot” — and as such, it’s a really tough question to answer definitively. At the same time, though, there are a range of things that we can anticipate might go wrong.
Broadly speaking, we can divide these issues into a number of categories:
- Hitting “hard” limits; for example a full disk, hitting a physical memory limit or reaching the maximum number of processes
- Network issues; for example a site becoming unreachable, high packet losses, server connections going down or DNS failures
- Component or service failures; perhaps your database server has gone down, for example.
- Problems with third-party services; your S3 bucket is unreachable, your mail provider is experiencing issues, or your CDN has gone down
- Problems with your applications; errors and exceptions, inconsistencies in your data or even bugs in your code
- Keeping third-party code or operating system components up-to-date; in particular checking for security patches or service packs
- There are even silly examples of human error which, alas, do still happen; such as forgetting to renew a SSL certificate.
Once you have an idea of what can go wrong, you start to get a feel for what to monitor.
Important Things to Consider
Whilst a lot of monitoring deals in metrics such as server response times, the amount of available memory or the level your CPU is running at, there’s nothing quite as important as the experience your users are having. In all likelihood there will be some correlation between the raw data from your server or application and the wait times your user faces — for example, high network latency probably means slow response times and therefore users waiting around for your application to load — it’s always worth seeing your service through their eyes. To that end, there are a number of tools you can use. But first, let’s look at a few things we need to consider.
There are a range of factors which can make a website or application perform differently depending on the user, their device and their circumstances.
Are they using a desktop / laptop or a mobile?
It probably goes without saying that mobile users often face a very different experience to those on fixed or desktop machines. Nevertheless, it’s worth monitoring the extent to which the performance does vary.
What kind of network connection do they have?
As well as mobiles generally performing worse than desktop in terms of hardware and network constraints, network connections also vary enormously — not everyone has high-speed fiber-optic based broadband connections, for example.
Where are they?
At some stage in a project, you’ll probably need to decide where your server — or servers — will be physically located. Whilst some websites or web applications target a very specific geographical location — for example a UK-only online store, a US high street chain or a Tokyo transportation hub — the Web is global, which means people can or will access it from over the World. As a rather oversimplified rule of thumb, the further a visitor is from your servers, the longer it’ll take to travel “over the wires”.
It’s worth taking a look at where your visitors are based — the results might surprise you. For example technology blogs are often hugely popular in India — but have you considered how, say, a US-based server copes with the distance?
Monitoring, Data and Alerts
There are two aspects to monitoring; viewing data, be it historical (“Show me average response times and HTTP Status codes 4xx in the last 24 hours”) or in real-time, and also alerts.
For example, you might opt to receive an email if the amount of free disk space on one of your servers drops below a certain threshold, so that you can take pro-active action before it becomes full. Alternatively, you can set up an alert for when something goes wrong. For example, you could opt to receive a push notification or an alert via Zapier if your website becomes inaccessible.
Deciding the parameters for alerts is a tricky business. How do you know, for example, how high your CPU load should reach before it’s worth sending you an e-mail?
A good way to answer this might be to cross-reference various parameters. For example, by monitoring response times and CPU load, you might find that response times become unacceptably low if your CPU load goes above 80%.
There is a difficult balance to be struck. If alarms are too frequent then it’s easy to become de-sensitized to them. Too few, and chances are you have users somewhere who have become frustrated, servers which have gone down or a site that’s become inaccessible and you just don’t know about it. Learning to achieve this balance is a skill in itself.
Monitis is a service for capturing all sorts of data about the health of your servers, the performance of your web and database servers, the responsiveness of your web application and much more.
It provides a real-time view of this data through a web interface — your dashboard — which is illustrated below.
Additionally it allows you to assess the performance of a website or web application from the viewpoint of your users, whatever device they’re using or wherever they are in the World.
You can set up alerts to let you know if something goes wrong. Let’s take a quick look at the options.
Server Monitoring with the Monitis Smart Agent
In order to monitor the health of a server with Monitis, we need to install the agent software. This is a small program which runs in the background as a daemon, monitoring the general health of the server and reporting that data back to Monitis so that we can make use of it. See the Linux or Windows documentation for instructions on how to install it.
If you’d like to try it out quickly, I’d recommend creating a new Ubuntu droplet with Digital Ocean. You should find that it works out-of-the-box, with all of its dependencies already installed for you. Alternatively you may wish to check out this Chef cookbook.
Once the agent software is installed and running, you’ll need to set up monitoring via the Monitis dashboard on the Web. We’ll look at that next.
Monitis also allows you to monitor the health of your applications.
Monitis provide SDKs for most popular languages – Java, Perl, Python, PHP, Ruby, C# — popular web and application servers such as Apache, Nginx, IIS, Tomcat and Node.js, various database servers such as MySQL, Postgres, MongoDB and SQL Server, and more. You’ll find a comprehensive list here.
Exploring the Monitis Dashboard
The Monitis dashboard is made up of a series of panels showing various types of data depending on what monitoring and alerts you’ve set up. The first time you access your dashboard, it provides a number of prompts and some inline help. Setting up monitoring for data like CPU load, remaining disk space, free memory and system load, as well for alerts, takes just a few minutes.
Once you’re set up, you can manipulate the panels by dragging and dropping to arrange them, resizing them or indeed showing and hiding them. Using the drop-down (context) menu at the top-right of each panel you can also maximize them for a better view.
From a panel’s context menu, you can also export monitoring data to a number of formats such as PSD or Comma Separated Values (CSV), as well as print it or send it via e-mail.
The settings menu item allows you to tweak the parameters, which vary according to the specific metric being monitored. For example you can set various thresholds for monitoring CPU levels, the level of free memory, the frequency of HTTP requests to attempt, and so on.
You can also create new “pages” (tabs), allowing you to divide up your monitors by whatever criteria is most appropriate to your needs. For example, you could set up a tab per-server, with key metrics across all of your services on one “front page”, or summary tab.
Other Tools for Monitoring
There are a number of tools you can use to further understand how your web application is performing or how it might react to certain events.
Full Page Load Monitors
Many web monitoring systems will provide you with response times; for example, “how long does a particular GET request take?”. As useful as that might be, this doesn’t provide the whole picture.
Full Page Load Monitors go one step further, giving you an indication of how long a page takes not just to download, but to render in a real browser. This gives you a better understanding of how your website or application is performing from a user’s perspective, not just in terms of HTTP response times.
In order to better understand how it affects the user experience, full page monitors often provide Apdex (Application Performance Index) scores. These are less concerned with hard numbers — although they do provide a numeric value between 0 and 1 — but rather, a qualitative measure of quality of experience for real users — for example “Satisfied”, “Tolerating” or “Frustrated”.
Web Stress Testing
Web stress testers allow you to simulate a large and configurable number of requests over a specific duration, to see how your web server copes. This might be useful for trying to determine how your server might handle “The Slashdot effect (which you might also refer to as the Twitter effect, or the Reddit effect) — in other words, how might your web server cope if you suddenly receive a high volume of visits over a short period of time? To give you an idea of what you can expect, you’ll find a sample report right here.
Website Security Scans
It’s also vitally important to monitor security vulnerabilities and, in extreme cases, for injection of malware. You’ll find the option to run a vulnerability scan in the Tools tab, and you’ll find a sample of the output here.
When you have a crisis, you need to have the right information to fix the problem. But even when things are going well, a steady stream of useful data is necessary for understanding what can go wrong with your system and infrastructure, and so where your development priorities should be.
The only way to take the right action is with relevant, useful and accurate data, and for that you need a strong monitoring platform like Monitis. The service can support a large range of useful metrics for monitoring the health of your servers, application and network. It allows you to see the state of your services in real-time, examine historical data to help understand them, and alert you when it’s time to take action.
Keen to give Monitis a try? Sign up for their 15-day trial today.