As enterprises move to adopt DevOps practices, there is a desire to measure where teams are on their journey. This sometimes leads to the development of maturity models used to assess teams’ levels and progress. While I am not going to pass judgment on what might be useful for certain organizations, I believe that the real key is to drive positive DevOps practices, which include continuous improvement through data such as lead time, frequency of deployments and mean time to resolution (MTTR) (i.e. the key metrics from the State of DevOps Reports).
Any static maturity model will become outdated. What’s more important is that teams are taught concepts that can help them continuously evaluate what is stopping them from delivering more quickly so that they identify areas for improvement. Key to this is the lean concept of value stream analysis. If you walk up to a team and ask them why they can’t go faster, they will typically say they are waiting for something. They could be waiting for more work to flow into their backlog. They could be waiting for infrastructure needed for development or testing. Or, they could be waiting for another team to develop a service they need to consume.
Many teams don’t spend time in retrospectives talking about how to overcome these types of blockers or wait states and developing countermeasures as continuous improvement initiatives. One reason for this is the lack of objective data that can be analyzed to provide insights on this. This is where the lead time metric comes into play. Historically, lead time has been defined as (last) code commit to deploy into production. While this is important, it only tells part of story because accelerated delivery and feedback really starts with a customer concept and measuring how long this takes until it is delivered into production so feedback can be provided.
One challenge to providing this kind of end-to-end metric, which spans the entire delivery value stream, is the lack of an integrated delivery pipeline from which data can be automatically collected. A customer concept starts in the portfolio space, which might result in data being entered in some type of PPM (portfolio management) system. Once approved, this results in features being created which eventually should flow into a product backlog. Stories are then created that can be pulled by the agile line supporting that product into an iteration. This information is typically in some agile management system (e.g. Jira or IBM Rational Team Concert). Then the team develops and builds the stories, tests them and deploys them into production via a deployment tool like UrbanCode. So, this flow of work goes through various tooling and generally is not visible or easily traceable across the value stream.
Our initial goal in measuring lead time was to shift left and measure the time from story card backlog until the story was released into production. This would then give us four discrete measures that would provide more insight into the time spent in different parts of the value stream.
- Time spent in the backlog
- Time spent developing the story
- Time spent testing the story
- Time spent once the story was completed (marked “done”) until it was deployed
We mapped the first and last categories into wait times and the second and third into process times to provide a baseline of this metric. While we have just started providing this information to teams, we are finding that scrum masters believe this type of data can be used in retrospectives to better understand why some lead times are high. They also believe that this will help the team better understand when blockers are encountered during development and testing and what contributed to them. We also plan to make that visible as part of our metrics (time spent in a blocked state). Tooling like TaskTop Integration Hub can help integrate your pipeline tooling to provide visibility and insight.
Once teams gain more insight into what is slowing them down, they can identify countermeasures that can be applied. Categories of wait-states include:
- Waiting for a dependent area to provide functionality necessary to complete a feature
- SLAs associated with a service that is provided by a centralized organization (e.g. performance testing or security testing)
- SLAs associated with infrastructure needed for an environment
- Contention for a shared test environment
Countermeasures associated with these types of wait-states are:
- Architecture decoupling
- Dark Launching / Feature Toggling
- Expanding cross-functional teams to enable them to do self-service performance and security testing (at least some portion of it)
- Automated Provisioning Capability for environment to avoid wait-states and contention
The “State of DevOps Report 2017” contains information on the importance of architecture and how decoupling dependencies (using APIs and microservices) can mitigate certain types of wait states. DevOps Case Studies provides more information on applying DevOps practices such as dark launching.
But again, at the heart of this is getting teams to think more about what their value stream actually looks like, how long it takes to deliver business value that can generate feedback and what is getting in their way of going faster. This is not a technology play. This is not something that somebody can sell or educate you on. This is something the team itself needs to take on.
A good starting place for a team is to educate themselves on some of these concepts for DevOps practices using a book club. A great place to start is to study the “Phoenix Project.” This novel approach combines aspects of both DevOps and Lean to drive home some key points in the journey to discover types of work, manage work in progress and ultimately make visible and gain control over the work being done for the business. Once a team has an understanding of their value stream and blockers, they can then study the “DevOps Handbook” that provides practical guidance for improving their delivery capability and speed.
About the Author / Carmen DeArdo
Carmen DeArdo is the director of Build Technologies at Nationwide Insurance. He is responsible for helping drive accelerated delivery utilizing Lean and DevOps techniques across the end-to-end delivery value stream. One current focus area is the creation of an integrated delivery pipeline capability.