4 Agile Ways to Handle Bugs in Production

Key Takeaways

Agile development is about adapting to reality, including the inevitability of bugs. When bugs are discovered in production mid-sprint, teams must choose the best approach based on their company structure, the bug’s severity, and customer needs.
Four potential approaches include: The Minimal Impact Option, where the production server is rolled back to its pre-update state; The Deep Exploration Option, where the bug’s scope is thoroughly assessed and planned for; The Urgent Effort Option, where a critical bug necessitates a fix during the current sprint; and The Nuclear Option, where a critical bug requires stopping and restarting the sprint.
To prevent bugs, consider using canary servers to test changes on a subset of users, and develop code with toggles to turn new features on or off as needed. These measures can help identify and manage potential issues without impacting the entire customer base.
Always discuss bugs and their handling in retrospectives to ensure team understanding and improve future responses. The product owner is responsible for making these decisions, and the engineering team should work in a way that allows for efficient bug handling.

A bug

In a perfect world, every time we rolled out code at the end of a sprint, it would work perfectly in production. There would never be any bugs, and there would never be any issues that forced us to roll back code that has already been deployed.

Of course, we don’t live in a perfect world. That’s one of the reasons why we have agile in the first place. Agile isn’t about pretending that your world is perfect. It’s about adapting to reality, and iterating to improve your processes and your flexibility so that when problems arise you’re able to deal with them.

One of the problems that comes up frequently for teams is the discovery of a new bug in production right in the middle of a sprint. Your team has finished deploying, all the tests passed, and everything has been pushed out to production so customers can start using it.

But maybe an edge case that wasn’t considered comes up. Maybe some aspect of the code that wasn’t fully tested comes to the surface, and starts causing problems for users. How’s your agile team supposed to respond to that?

There are many different approaches to dealing with bugs in production that come up during a sprint. Choosing the one that works best for your team is dependent on how your company is structured, how critical the bug is, and what matters most to your product owner and your customer.

The Minimal Impact Option

If a bug in production is the result of a previous sprint’s work, and it’s having a negative effect on users, the simplest thing to do whenever possible is to roll back the production server to the state that it was in before it was updated after the last sprint. At the very least, this will minimize the impact of the bug on new users.

Doing this requires having a production deployment system setup that supports clean rollbacks. An agile team with the ability to push code into production should ideally be working in an environment that supports continuous deployment, or at the very least deployment tags that allow you to roll back your production servers to a previous state. It’s times like this that you really appreciate having strong deployment or devops engineers on the team.

If it’s possible to solve the problem that simply, the product owner may choose to write a bug story to be worked on in the next sprint. That will prevent this current sprint from being interrupted, and reduce the impact on the team’s velocity. Handling bugs this way also allows the team to consider more carefully the potential impact of the bug, and the best way to fix it.

The Deep Exploration Option

Sometimes fixing a bug in production isn’t as simple as it sounds. For example, the bug could have had an effect on the data being entered into the application, or the bug may actually exist in the data layer. In this case, database recovery may be necessary, which introduces a whole range of other difficulties.

Recognizing the potential scope of a bug is the responsibility of the product owner in concert with the engineering team. When a bug is discovered, it may be necessary for the product owner to pull one or more engineers into meetings to discuss the depth of the impact and make a plan of action. Of course, the team’s velocity in the sprint will likely be reduced merely because of the need to assess the extent of the damage and propose a viable solution.

If the bug is urgent enough and the prognosis is uncertain, it may be necessary to introduce a new spike within the current sprint, and have somebody on the engineering team start looking ahead toward what’s going to be necessary to fix the bug in the next sprint. Bugs can be difficult to estimate because of their unknown nature, and it’s usually a good idea not to assign points to a bug for that reason. However, having one engineer take away a little bit of effort from the current sprint can pay off in the long run, without holding back the whole team.

The Urgent Effort Option

More bugs

It’s not always possible to put off a bug fix until the next sprint. Sometimes a bug is so critical, and affects such an important aspect of the product, that it’s necessary to implement a fix during the current sprint. Ideally, this effort won’t require the entire development team. It’s the product owner’s responsibility to assess the scope of the damage, and decide whether it’s worth introducing a new story in the middle of a sprint to address a critical bug.

Introducing new stories in the middle of a sprint is never a good idea. A good scrum master should work with the product owner to try to limit changes to a sprint that’s in progress. But that doesn’t mean that it’s never necessary, and a good scrum master should also be able to communicate clearly to the team when and why it’s important to adjust the backlog if that’s the best option.

The goal in this case is to have as small an impact on the sprint as possible. Perhaps the developers who worked on the section of code that is causing the problems can be pulled off of the stories they’re working on, and temporarily assigned to fix the bug. Of course, any stories they’re working on will suffer, and there won’t be any points earned in the sprint for work done on a bug from a previous sprint.

The Nuclear Option

If a critical bug is discovered in production code, the presence of the bug is causing serious problems, and more than half of the development team is needed to work in concert to fix it, sometimes the only thing to do is to stop the sprint and start a new one.

This should always be the last resort for a product owner. While the product owner always has the option of stopping a sprint, it should be realized that the continuity of any work in progress will be lost, and the velocity calculations related to that sprint will be lost as well. For planning purposes, all the work already done within that sprint should be considered forfeit.

Of course, that doesn’t mean that the work is actually discarded. But from an engineering standpoint, stopping work on a story and then resuming it later can have such a negative impact on the focus and concentration needed from an engineering team that this option should only be used in the most extreme of cases.

If your team goes for this option, don’t make the mistake of trying to create a mini bug-sprint that is shorter than a typical sprint. If you think the bug can be fixed in less than the time that it takes to complete a single sprint, keep the backlog from the previous sprint and let the engineers start working on those stories again once the bug is fixed. Your overall velocity should account for the effort needed to fix bugs created by your team, not pretend that it doesn’t exist. Creating sprints of multiple lengths is a serious scrum anti-pattern that will destroy your ability to track your team’s real velocity.

An Ounce of Prevention

Depending on the kind of code your company is deploying and the size of your user base, it’s often a good idea to use canary servers to test the impact of any changes in production on a subset of your users. This allows you to ferret out possible edge cases quickly, and limit the impact on the overall customer base.

Even if canary servers aren’t an option, it’s always a good idea to develop your code with toggles that will allow you to turn new features that have been added to the product on or off at will. Scrum is about developing and deploying full top-to-bottom features each sprint. If a new feature is added this way, and it turns out to have a bug in it that doesn’t have broader implications, toggling it off in production may allow the team to continue moving forward while minimizing the impact on the users.

Don’t Forget the Retrospective

Of course, issues such as these should always be discussed at the retrospective, to make sure that everybody is on the same page about what happened, and how to prevent it or deal with it more effectively in the future.

Ultimately the product owner is responsible for making these calls. The agile engineering team needs to be working in a way that allows for the most productive and efficient response in the case of a bug in production.

Frequently Asked Questions (FAQs) on Agile Ways to Handle Bugs in Production

What are the key principles of Agile bug handling in production?

Agile bug handling in production is based on several key principles. First, it emphasizes the importance of continuous integration and delivery, which allows for frequent releases and immediate feedback. This approach helps to identify and fix bugs early, reducing their impact on the production environment. Second, Agile encourages cross-functional teams, where everyone is responsible for quality and bug handling. This shared responsibility fosters a culture of collective ownership and accountability. Lastly, Agile promotes transparency and communication, ensuring that all stakeholders are aware of the bugs and their status.

How does Agile methodology help in managing bugs in production?

Agile methodology is highly effective in managing bugs in production due to its iterative and incremental approach. It allows for frequent releases, which means bugs can be identified and fixed quickly. Agile also encourages regular communication and collaboration among team members, which helps in faster resolution of bugs. Moreover, Agile teams use tools and techniques like automated testing and continuous integration to prevent bugs from reaching the production environment.

What are some Agile practices for handling bugs in production?

Some Agile practices for handling bugs in production include using a bug tracking system, prioritizing bugs based on their severity and impact, and incorporating bug fixes into the regular sprint cycle. Agile teams also use techniques like pair programming and code reviews to prevent bugs. Additionally, they conduct retrospectives after each sprint to learn from their mistakes and improve their bug handling process.

How does Agile handle production support?

In Agile, production support is handled as part of the regular sprint cycle. The team prioritizes production issues along with new features and enhancements. They use a bug tracking system to manage and monitor these issues. The team also collaborates closely with the operations team to ensure smooth deployment and maintenance of the software.

How does Agile ensure quality and prevent bugs in production?

Agile ensures quality and prevents bugs in production through practices like Test-Driven Development (TDD), Continuous Integration (CI), and Automated Testing. TDD involves writing tests before the actual code, which helps in identifying and fixing bugs early. CI involves integrating code changes frequently and testing them immediately, which helps in detecting bugs quickly. Automated Testing reduces the chances of human error and ensures that all parts of the software are tested thoroughly.

How does Agile prioritize bugs in production?

Agile prioritizes bugs in production based on their severity and impact on the users. High-priority bugs are those that significantly affect the functionality of the software or pose a security risk. These bugs are fixed immediately. Lower-priority bugs are those that have a minor impact on the software and can be fixed in the regular sprint cycle.

How does Agile handle bug reporting and tracking?

Agile teams use bug tracking systems to manage and monitor bugs. These systems allow the team to log bugs, assign them to team members, track their status, and document their resolution. The team also uses dashboards and reports to visualize the bug data and make informed decisions.

How does Agile handle regression testing?

In Agile, regression testing is conducted frequently to ensure that new changes have not introduced any bugs in the existing functionality. The team uses automated testing tools to perform regression testing, which saves time and ensures thorough testing.

How does Agile handle bug triage?

Bug triage in Agile involves reviewing and prioritizing bugs based on their severity and impact. The team conducts bug triage meetings regularly to discuss the bugs and decide on their priority and resolution.

How does Agile handle post-production bugs?

In Agile, post-production bugs are treated as high-priority items and are fixed immediately. The team collaborates closely with the operations team to resolve these bugs and ensure smooth operation of the software. They also conduct a root cause analysis to prevent similar bugs in the future.