Everyone wants to optimize a team’s performance, but it’s not enough to just talk about it. To truly impact your team’s performance, you need to take a page from operationally mature DevOps teams and use metrics to gain valuable insight into your team’s work, enhance their capacity and drive cultural change.
The right data allows managers to make quick decisions with confidence and minimal risk. It also lets managers see the actual outcome of those decisions, in turn allowing them to shape their team’s direction going forward and to create a happier, healthier and more productive team dynamic. However, making the shift to data-driven decision making requires more than just increased monitoring. It requires a cultural change.
Below are four key metrics DevOps teams should be monitoring, as well as the impact each metric can have on a team’s culture and performance.
- Use Time to Response to Establish a Culture of High Achievement
Time to response (also called time to acknowledgement) depends on individual team members. Incident responders might not always have control over the root cause of a particular incident, but they are always in control of how quickly they acknowledge and respond. Hold team members accountable for their response time by setting internal targets and having high expectations.Most operationally mature companies use IT operations management tools to enforce a response time target. These tools allow managers to set an incident response time window, making sure the next person in line is alerted if an incident isn’t responded to. Tracking escalations also gives managers valuable data about their team. - Manage Expectations With Escalations
For an organization using an IT operations management tool, an escalation is often an exception – it’s a sign that a responder either wasn’t able to acknowledge an incident in time or that the responder didn’t have the tools or skills to address it. While escalation policies are a necessary and valuable component of incident management, teams should ideally focus on driving the number of escalations down. Managers can gauge their team’s performance by tracking the number of incidents escalated over time, and using that data to determine whether their targets need to be adjusted.That said, there are some situations where an escalation is part of standard operating practice. For example, you might have a NOC, first-tier support team or even auto-remediation tool that triages or escalates incoming incidents based on their content. For these situations, it’s important for managers to track what types of alerts should be escalated and what normal numbers should look like for those alerts. - Combat Alert Fatigue With Raw Incident Count
As an organization grows, incident counts often grow, as well. However, as a team becomes more efficient and mature, the incidents per responder should lower or, at the very least, stay constant.Implementing IT operations management tools can help teams lower their raw incident counts by helping to weed out low-quality alerts, automate common fixes and build runbooks. They can also help break down incident count by team or service, putting incidents in context with the rest of the organization. This ensures that team members are maximizing their time on attacking alerts that matter and building new features, instead of getting mired in alert fatigue. - Gauge Operational Readiness With Mean Time to Resolution
Time to resolution, or how long it takes a team to resolve an incident, is the highest standard in operational metrics. The baseline for time to resolution can vary from organization to organization, depending on complexity environment, organization of responsibility, even the industry in which a company operates.Tracking time to response allows managers to determine their norm mean time to resolution and ensure their teams are able to work through the challenges of a major incident.
Use These Metrics to Foster Intelligent Change
It’s no secret that downtime is expensive, both in loss of revenue and customer trust. Metrics like the ones discussed are put in place to combat downtime and increase reliability, yet it can be easy for teams to become too focused on numbers and past performance.
It’s important for team members to not lose sight of the business goals and why they’re tracking those metrics in the first place. It’s also important for managers to avoid over-analyzing the past. Metrics measure what has already happened, and, while they can tell a lot about a team’s previous performance, they should be used as a tool to a better future and not a means of assigning blame.
Metrics are a means to an end, and having more information than you need won’t help you improve your team and refine your business. Keeping the emphasis on subsequent action is the key to using metrics to drive cultural changes.