Being On-call Sucks: The ProofBy Tim Evko
This article was sponsored by VictorOps. Thanks for supporting the sponsors who make SitePoint possible.
Working in devops can sometimes mean hours, days, and entire weekends on-call, in case something goes wrong with your application during a critical time period. While some devops teams have managed to successfully adapt to the on-call process, most find that being on-call is a strain on their team.
VictorOps is on a mission to solve the toughest challenges faced by software teams who require their members to be on-call. They’ve recently surveyed over 500 people in the devops world, and asked them about how they’re handling the on-call process.
What They Found
Being On-call is Stressful for Some…
Worried your team is finding it difficult to maintain peak efficiency while on-call? You’re definitely not alone. In fact, one of the most common findings in the VictorOps survey was that being on-call is a slow developing and often inefficient process. Some participants even went so far as saying the problems based around being on-call were not improving for them, but actually getting worse. What do these on-call difficulties go on to cause? Some respondents cited marriage, family, and other home-life related tension, which speaks to just how difficult the on-call process can be.
And a Breeze for Others
Other participants seemed to have no trouble at all during their experience being on-call. For them, helping the customer and the excitement of the job seemed to make everything worth it.
Automation Is on the Rise
Of the 500+ surveyed, more than half are using infrastructure automation tools, and many find that those tools help ease the pain when it comes to being on-call.
Homegrown May Not Be the Best…
If you’re finding difficulties in the tools and processes you’re using throughout the on-call process, it might be time to move your team over to a SaaS solution. According to the report, 70% use homegrown tools to solve on-call problems and almost 50% have no idea what they’re paying for their current solution. The majority of respondents were not happy with the homebuilt systems being used on-call. While new tools and solutions are becoming available, most on-call teams have yet to switch over. This may be one of the biggest reasons for the challenges being faced by those involved in the on-call process.
But Alert Fatigue is even Worse
Most of us have been in a situation where we were inundated with monitoring alerts, most of which were false alarms. In fact, of those surveyed, 64% believed that up to a quarter of all alerts received were false. Aside from being annoying to deal with, false alarms can be expensive.Just how expensive you ask? VictorOps data shows that in just one year, false alerts could be costing your organization upwards of $100,000.
Problems Are Solved in the Chatroom
When a real alert comes through, every software team has it’s own way of finding a resolution. There are similarities however, and most teams will likely solve the issue with 1 – 5 participants communicating over a chat platform.
When the Smoke Clears
What happens after the issue is solved? According to the recent survey from VictorOps: Nothing. Sadly 75% of those surveyed will only conduct a retrospective after significant outages and other serious errors. Want to avoid the downtime? VictorOps data shows companies who conduct retrospectives after every issue, no matter how insignificant, are more likely to have less downtime, and fewer false alarms too.
There’s plenty to learn in this insightful report from VictorOps, especially if you’re currently struggling with the on-call process. Download the full report from VictorOps here!