Monitoring tools provider Catchpoint, in collaboration with the DevOps Institute, has published a survey that finds there is a significant gap between the tasks and duties that define the role of what a site reliability engineer (SRE) should be versus what is actually occurring within IT organizations that have hired SREs.
Based on a survey of 600 IT professionals who have SRE titles or individuals that identify themselves as performing SRE-like tasks, the “Catchpoint SRE 2020 Report” finds the bulk of respondents are still squarely focused on managing IT infrastructure.
Google, which is widely credited with defining the SRE role, maintains SREs should be spending as much time managing applications as they do infrastructure. Survey respondents on average are spending 75% of their time on operations. Only 14% of respondents said they are spending 50% or more of their time on application development processes. More than have of respondents (53%) also noted they are becoming involved too late in the application life cycle.
The Catchpoint survey finds the top three tasks survey respondents associated with their roles were monitoring and alerting (93%), dashboarding (73%) and managing infrastructure as code (71%).
Catchpoint CEO Mehdi Daoudi said one of the most telling aspects of the survey is only 53% of respondents identify observability as a task associated with their role, even though that’s a core tenet for best DevOps practices. Further down the list of DevOps tasks were testing (41%), telemetry (38%) and chaos engineering (26%).
Daoudi said it’s probable many IT organizations are adding SREs in anticipation of furthering their DevOps ambitions. Not every SRE has all the skills required, but as more of them gain hands-on experience it’s likely many will grow into those roles. In fact, 41% of respondents said half or more of their work revolves around mostly manual, repetitive tasks that could be automated. More than half of respondents (52%) said they spend too much time on debugging-related tasks.
Part of the challenges many IT organizations face derives from not putting enough focus on automating the management of legacy IT infrastructure, said Daoudi. Every time a new platform is adopted there’s a rush to operationalize it. However, IT teams frequently overlook the need to automate the management of rote tasks to free up more time to learn how to manage new platforms such as Kubernetes.
Going forward, the survey finds nearly 50% of respondents said they believe they will be working remotely post COVID-19. More than half of respondents also said personal challenges included staying focused and having a good work/life balance while working from home.
The degree to which SRE titles are being adopted to enable IT professionals to justify a raise or simply pad their resumes is unknown. However, the focus on the SRE role is likely to increase in the wake of the economic downturn brought on by the COVID-19 pandemic. In theory, a single SRE should be able to manage all the tasks that previously might have required as many as 10 IT administrators. At a time when many organizations are looking to reduce the total cost of IT, the number of full-time employees required to manage IT environments has clearly become a major concern.