Blogs

Report: The State of DevOps Automation

In the race to accelerate digital transformation initiatives, organizations are encountering more incidents, more downtime, and longer resolution times. In fact, 90.4% of organizations saw an increase in incidents since the pandemic began, according to a recent Transposit report.

In a completely digital economy, service downtime takes a higher toll. For ITOps teams, working remotely also requires new strategies to organize a team’s response to issues. Amid rising incidents, 94.1% of organizations have increased their focus on service reliability engineering (SRE) practices.

Transposit just released their 2021 State of DevOps Automation report, which surveyed over 500 IT, DevOps, and SRE professionals across the U.S. The report sheds light on the state of remote work as it relates to incident response processes, reveals increased demand for SREs and highlights the automation they’re using to narrow the resolution time gap. Below, we’ll explore the report’s key takeaways and discover what sort of DevOps automation is trending.

Remote Work Slows Incident Response

Try as they might, videoconferencing and group chat solutions like Zoom and Slack just can’t compete with in-person collaboration. According to the survey, 93.6% of ITOps and software teams reported it takes longer to resolve incidents while working remotely.

Since the pandemic began, 97% of teams have adopted remote work, and 90.4% have experienced increased service incidents. It doesn’t bode well that incidents are increasing as incident response time lags.

The report findings frame remote work as a top hurdle to resolving issues — 46.1% reported that not being physically in the same room to collaborate on fixing issues was a top concern. But remote work is only one element; others cite changing processes and a lack of visibility into who’s responsible for maintaining code as another challenge.

Incident Response Hangups

To make up for these concerns, organizations are investing in incident response strategies. The report found that 56.4% of companies are adopting communication and collaboration tools, with automation tools at 52.3% and integration tools at 43.9%.

Though the report showed greater investment in those tools, it also identified a persistent communication gap. When conducting postmortems, it can be challenging to correlate human action to incident responses. More than half (57.4%) of respondents felt it’s hard to piece together human actions and team communication during incidents. This context is essential to make sense of mean-time-to-response (MTTR) metrics.

Interestingly, the report found 96.4% of respondents believe mining insights from human data, like Slack channels or group emails, could help improve incident responses. It appears that having a clearer way to harvest and query incident-related data throughout the entire response process is a universal concern.

SRE Goes Mainstream

As I previously reported, the SRE role continues to evolve and gain favor. The report similarly hypothesizes that, based on the survey results, the SRE title is going mainstream. Out of 295 respondents surveyed, 98% said they increased SRE practices in the last 12 months, and 62.4% said they would be expanding SRE efforts in 2021.

In 2021, SREs will likely remain a hot commodity in hiring. However, these individuals will face mounting complexity in addressing real-time incidents for complex distributed systems. They will also likely be tasked with automating manual processes, as 51.7% of respondents reported that a lack of automation is the top cause of slow incident resolution.

Automation Goals and Barriers

40% of companies surveyed now have at least one dedicated full-time staff member working to create in-house tools or bots to automate incident responses. So, what sort of automation is common in the SRE field? The report shows SREs constructing custom scripts to automate things like:

  • Customer communication
  • Recording MTTR metrics
  • Automating infrastructure
  • Corporate communication
  • Auto-generating runbooks
  • Creating or updating tickets

But, automating these processes is not all that easy. As a top barrier to automation, DevOps professionals collectively cite a lack of documentation. It seems by replacing tribal knowledge with up-to-date, shared documentation, teams could quickly unlock some gains. So, make documentation accessible, and provide copy-and-paste abilities for sample scripts.

SRE and Automation to the Rescue

The study demonstrates how the remote economy has increased both the number of incidents and the mean response times to address them. To fill this gap, organizations increasingly turn to site reliability engineering practices to automate incident responses. By leveling up DevOps with SRE, teams could reduce errors and improve overall availability as development teams progressively deliver new features.

Bill Doerrfeld

Bill Doerrfeld is a tech journalist and analyst. His beat is cloud technologies, specifically the web API economy. He began researching APIs as an Associate Editor at ProgrammableWeb, and since 2015 has been the Editor at Nordic APIs, a high impact blog on API strategy for providers. He loves discovering new trends, researching new technology, and writing on topics like DevOps, REST design, GraphQL, SaaS marketing, IoT, AI, and more. He also gets out into the world to speak occasionally.

Recent Posts

Survey Sees AI Playing Larger Role in Test Automation

A Tricentis survey found organizations could see massive costs savings by fully automating mobile application testing.

1 hour ago

A Brief History of DevOps and the Link to Cloud Development Environments

The history of DevOps is worth reading about, and “The Phoenix Project,” self-characterized as “a novel of IT and DevOps,”…

1 hour ago

The Rise of Low-Code/No-Code in DevOps

The rise of low-code/no-code platforms in DevOps is reshaping the way software is developed and deployed.

2 hours ago

Building an Open Source Observability Platform

By investing in open source frameworks and LGTM tools, SRE teams can effectively monitor their apps and gain insights into…

1 day ago

To Devin or Not to Devin?

Cognition Labs' Devin is creating a lot of buzz in the industry, but John Willis urges organizations to proceed with…

1 day ago

Survey Surfaces Substantial Platform Engineering Gains

While most app developers work for organizations that have platform teams, there isn't much consistency regarding where that team reports.

2 days ago