As we close out 2022, we at DevOps.com wanted to highlight the most popular articles of the year. Following is the latest in our series of the Best of 2022.
A technology company’s most valuable assets are its people and data, especially data about the organization itself. By knowing what data to track over time, engineering leaders can measure how efficiently their DevOps teams are operating and enable them to maximize their value stream to deliver the best possible product to end users.
After years of research, Google’s DevOps Research and Assessment (DORA) team identified four key metrics for evaluating a team’s performance:
- Lead time for changes
- Deployment frequency
- Mean time to recovery
- Change failure rate
DORA metrics have now become the standard for gauging the efficacy of software development teams and can provide crucial insights into areas for growth. These metrics are essential for organizations looking to modernize and those looking to gain an edge against competitors. Below, we’ll dive into each metric and discuss what they can reveal about development teams.
Lead Time for Changes
Lead time for changes (LTC) is the time between a commit and production. LTC indicates how agile a team is—it not only tells you how long it takes to implement changes but how responsive the team is to the ever-evolving demands and needs of users. The DORA team identified these benchmarks for performance in their Accelerate State of DevOps 2021 report:
Elite performers: Less than one hour
High performers: One day to one week
Medium performers: One month to six months
Low performers: More than six months
LTC can reveal symptoms of poor DevOps practices: If it takes teams weeks or months to release code into production, there are inefficiencies in their process. One can minimize their LTC through continuous integration and continuous delivery (CI/CD). Encourage testers and developers to work closely together so everyone has a comprehensive understanding of the software. Consider building automated tests to save more time and improve the CI/CD pipeline.
Because there are several phases between the initiation and deployment of a change, it’s wise to define each step of the process and track how long each takes. Examine the cycle time for a thorough picture of how the team functions and further insight into where they can save time.
One should be careful not to let the quality of their software delivery suffer in a quest for faster changes. While a low LTC may indicate that a team is efficient, if they can’t support the changes they’re implementing or they’re moving at an unsustainable pace, they risk sacrificing the user experience. Rather than compare the team’s Lead Time for Changes to other teams’ or organizations’ LTC, one should evaluate this metric over time and consider it an indication of growth (or stagnancy).
Deployment frequency (DF) is how often you ship changes; how consistent your software delivery is. This metric is beneficial when determining whether a team is meeting goals for continuous delivery. According to the DORA team, these are the benchmarks for Deployment Frequency:
Elite performers: Multiple times a day
High performers: Once a week to once a month
Medium performers: Once a month to once every six months
Low performers: Less than once every six months
The best way to enhance DF is to ship a bunch of small changes, which has a few upsides. If deployment frequency is high, it might reveal bottlenecks in the development process or indicate that projects are too complex. Shipping often means the team is constantly perfecting their service and, if there is a problem with the code, it’s easier to find and remedy the issue.
If a team is large, this may not be a feasible option. Instead, one may consider building release trains and shipping at regular intervals. This approach will allow the team to deploy more often without overwhelming your team members.
Mean Time to Recovery
Mean time to recovery (MTTR) is the average amount of time it takes your team to restore service when there’s a disruption like an outage. This metric offers a look into the stability of your software, as well as the agility of your team in the face of a challenge. These are the benchmarks identified in the State of DevOps report:
Elite performers: Less than one hour
High performers: Less than one day
Medium performers: One day to one week
Low performers: Over six months
To minimize the impact of degraded service on your value stream, there should be as little downtime as possible. If it’s taking your team more than a day to restore services, you should consider utilizing feature flags so you can quickly disable a change without causing too much disruption. If you ship in small batches, it should also be easier to discover and resolve problems.
Although mean time to discover (MTTD) is different from mean time to recovery, the amount of time it takes your team to detect an issue will impact your MTTR—the faster your team can spot an issue, the more quickly service can be restored.
As with lead time for changes, you don’t want to implement sudden changes at the expense of a quality solution. Rather than deploy a quick fix, make sure that the change you’re shipping is durable and comprehensive. You should track MTTR over time to see how your team is improving and aim for steady, stable growth.
Change Failure Rate
Change failure rate (CFR) is the percentage of releases that result in downtime, degraded service or rollbacks, which can tell you how effective a team is at implementing changes. As you can see, there is not much distinction between performance benchmarks for CFR:
Elite performers: 0-15%
High, medium and low Performers: 16-30%
Change Failure Rate is a particularly valuable metric because it can prevent a team from being misled by the total number of failures they encounter. Teams who aren’t implementing many changes will see fewer failures, but that doesn’t necessarily mean they’re more successful with the changes they do deploy. Those following CI/CD practices may see a higher number of failures, but if CFR is low, these teams will have an edge because of the speed of their deployments and their overall success rate.
This rate can also have significant implications for the value stream: It can indicate how much time is spent remedying problems instead of developing new projects. Because high, medium and low performers all fall within the same range, it’s best to set goals based on the team and the particular business rather than compare to other organizations.
Putting it All Together With DORA Metrics
As with any data, DORA metrics need context, and one should consider the story that all four of these metrics tell together. Lead time for changes and deployment frequency provide insight into the velocity of a team and how quickly they respond to the ever-changing needs of users. On the other hand, mean time to recovery and change failure rate indicate the stability of a service and how responsive the team is to service outages or failures.
By comparing all four key metrics, one can evaluate how well their organization balances speed and stability. If the LTC is within a week with weekly deployments but the change failure rate is high, then teams may be rushing out changes before they’re ready, or they may not be able to support the changes they’re deploying. If they are deploying once a month, on the other hand, and their MTTR and CFR are high, then the team may be spending more time correcting code than improving the product.
Because DORA metrics provide a high-level view of a team’s performance, they can be beneficial for organizations trying to modernize—DORA metrics can help identify exactly where and how to improve. Over time, teams can measure where they have grown and which areas have stagnated.
Those who fall into the elite categories can leverage DORA metrics to continue improving services and gain an edge over competitors. As the State of DevOps report reveals, the group of elite performers is rapidly growing (from 7% in 2018 to 26% in 2021), so DORA metrics can provide valuable insights for this group.
But remember that data will only get you so far. To get the most out of DORA metrics, engineering leads must know their organization and teams and harness that knowledge to guide their goals and determine how to effectively invest their resources.