Survey Surfaces Challenges Ahead on National DevOps Day

A survey published today for National DevOps Day found nearly two-thirds (63%) have seen an increase in the frequency of service incidents that have affected their customers over the course of the last 14 months.

The survey polled 1,046 engineering, IT operations, DevOps and site reliability engineering (SRE) professionals at organizations with more than 300 employees. The research was conducted by Transposit, a provider of a platform for connecting DevOps workflows, and found well over half (58%) of respondents reported the cost of downtime for their organization could be, on average, up to $499,999 per hour, with 40% of respondents noting that the cost of downtime has increased in the last year.

Respondents said the top contributing factors to the increase in service incidents were digital transformation (61%), new product rollouts or product updates (55%) and methods and tools for collaboration that do not adequately support their team working remotely (49%). A full 90% of respondents reported there has been an increased focus on digital transformation during the last year with 73% having expanded the technology stack they support. More than half (55%) have increased the size of their DevOps teams.

Overall, 46% of organizations reported they have experienced between six and 19 major incidents over the course of the last 12 months. More than half (53%) also reported an increase in the amount of time it takes to resolve incidents over the course of the last year, even though three-quarters of respondents (75%) said their organization increased their adoption of SRE practices in the last 12 months. Additionally, 65.1% of those organizations planned to hire site reliability engineers in the next 12 months. However, only just over a third of those respondents said their organization planned to expand their SRE efforts in 2022.

A full 83% said up to 50% of their engineering operations processes are automated, but over half of the respondents that use SRE practices noted they still manually enter data into an IT service management (ITSM) application or other system-of-record to keep track of actions that were taken by humans during the resolution of an incident. Nearly two-thirds said their approach to automation was incremental, in which they begin by codifying processes and worked up to more advanced, fully automated scenarios.

The top three tasks respondents would like to automate were service requests (53%), change requests (43%) and user provisioning (40%).The top barriers to implementing automation were cited as inadequate documentation of institutional knowledge and existing processes (56%), lack of clarity about what to automate (55%) and an inability to share knowledge (52%).

Investments to improve their incident management processes over the next 12 months are focused on automation tools or applications (48%), communications/collaboration tools or applications (42%) and integration tools or applications (41%). Only a quarter (25%) said their tools are integrated through one tool or platform.

Lack of unified communication with teammates (45%), processes that have changed or are harder to follow while working remotely (42%) and lack of visibility into dependencies and what teams or people are responsible for the code or infrastructure (38%) are all cited as reasons why serviced requests are now more complex.

Transposit CEO Divanny Lamas said despite all the challenges facing DevOps teams, it’s fortunate that advances in low-code automation are making it simpler to more widely implement DevOps best practices. The challenge is finding a way to implement those best practices faster at a time when IT environments are becoming more complex.