Charlene O’Hanlon talks with Leo Vasiliou, director of product marketing at Catchpoint, about the results of a study the company fielded with VMWare Tanzu and DevOps Institute of nearly 300 site reliability engineers (SREs). This year’s report underscores the challenges of multi-cloud, calls out the underutilization of AIOps and shows a systemic shift in core baselining data. The video is below, followed by a transcript of the conversation.
Announcer: This is Digital Anarchist.
Charlene O’Hanlon: Hey, everybody. Welcome back to TechStrong TV. I’m Charlene O’Hanlon, and I’m here now with Leo Vasiliou who is the Director of Product Marketing over at Catchpoint. Leo, thanks so much for joining me today. Really appreciate it.
Leo Vasiliou: Oh, of course, Charlene. Thank you very much for allowing us to have the conversation today.
O’Hanlon: Absolutely, absolutely. I want to talk to you about a recent survey study that you guys put out in conjunction with VMware Tanzu and DevOps Institute. What were you guys looking at?
Vasiliou: Well, from an existential perspective, we hope that the survey that we do leads into the annual report and we’d like for it to be the most data-backed set of insights in its community. There are sort of three underlying constructional essences of the report. One is to provide a core baselining data for the key SRE tenants. So DevOps, who had an amount of time, would call in reported levels of toil. The other essence is to look at what’s trending. And then the third one will generally be around the IT to business relationship. And then with each one of those constructional essences, the context and line items within each section will change from year to year.
O’Hanlon: Gotcha. Alright. So this one focuses on site reliability engineering and what they’re seeing and what they’re dealing with on kind of a daily basis. Is that right?
Vasiliou: Yes, yes.
O’Hanlon: Okay. Alright, great. So tell me a little bit about some of the findings in the study. I know you guys do these, as you just said, reports yearly. So I’m interested in knowing what’s changed from year to year and what really surprised you in this year’s report.
Vasiliou: Well, I like to say SREs are double struck with challenges because the SRE is essentially two jobs in one, right? They have to increase the efficiency of their operational activities while mitigating the risk of their transformational activities. So one of the key findings this year, which by the way this may be the most provocative dataset we’ve published just because some of the findings conflict with some popular theories in the market, but take the first one just as an example of that, is as part of the baselining set of data there was a self-reported drop in toil, which is the word they use to define repeated, mundane, busywork that doesn’t have any inherent value, of 15 percent. And one of the reasons this is such an insightful nugget to lead us into our various conspiracy theories is it was across the entire distribution of data, across all of their percentiles, as well as across all of the geographies that we looked at.
O’Hanlon: Wow.
Vasiliou: And so for me, when I had the blessed opportunity to do the research and sit down and co-write the report, having been stuck in the house for, let’s just call it a year and a half at this point, there was just something different about working on a piece like this and it just felt a bit more meaningful. And the conspiracy theory there or the hypothesis is that if people return to work, or return to the office I should say or some type of hybrid environment, will those reported levels of toil increase again next year as people adjust to the hustle and bustle of commutes just adding to frustration in how they think about things?
O’Hanlon: That is a really, really fascinating thought process there. We could do an entire conversation about just that and whether what happens this year is going to be indicative of what happens in the future or if this year is going to be like a reset year. So from an SRE perspective though, the fact that the level of toil decreased 15 percent, that seems pretty significant. If you look at the entire spectrum of what they have to deal with on a daily basis, 15 percent, good lord, that’s huge in the SRE space.
Vasiliou: Yeah. It is huge. I agree. It’s also based on the response of hundreds of survey participants year over year. And what’s interesting, Charlene, is one of the unique tactics we took in this year’s report is before we publish the reports and post survey pre report, we actually showed the results to various industry, let’s just call them, leaders. One of them we shared this particular section to had an alternative theory on why levels of toil dropped, and it had to do essentially with the fact that if people are not in the office, maybe does the very act of them not being observed compel them to do less busywork.
So there was an alternative theory. But bringing us back to and leading into one of the other essences of the report, which is kind of the trends piece, it had to do with the amount of monitoring data sources, like the volume, variety, velocity of data, and the number of multi same-service platforms, e.g. multi CDN, multi DNS, et cetera, in use. And so will the return to work and the promise of AI ops help them keep toil levels low? Which leads us into the next sort of surprising finding, which is that SREs and their implementation or shift to AI ops is quite slow to maybe what we might be reading about in the industry, which was probably the second most provocative finding in the report.
O’Hanlon: Yeah. So why do you think it is so slow then? Were there any trends that kind of came out of that? Any reasons why organizations are kind of slow to adopt AI ops?
Vasiliou: Well, there is obviously a deluge of information and rhetoric in the market on the internet today. We need to seriously consider this idea of an information diet, which is a concept I first heard about a couple years ago. When we think about the idea of the hype cycle versus the challenges SREs face on a day-to-day basis, sometimes the market and the promise of what these types of solutions will offer might outpace actual implementation capabilities from, hey, we’re an SRE, let’s go implement this. I think part of the challenge here is probably the concept of AI ops is too broad and too large for consumption. It’s the idea of just saying, “Hey, let’s go solve world hunger or, hey, let’s go solve or implement AI ops,” right?
O’Hanlon: Right, right.
Vasiliou: You’ve got to break it down into smaller components and incrementally develop capabilities from there. Otherwise, how the heck are you going to baseline your progress for when you’re done? I think it’s too big, too broad, and some of the vendor promises may be of self-healing, auto remediation, maybe they were a little more on the hype side, where some of the other AI ops of components like root cause analysis or even correlation, maybe those are truthfully more on the hope and promise side.
O’Hanlon: Okay. Do you think that AI ops – to your point, it is broad – do you think it just means different things to different people and so somebody actually may be implementing some type of AI ops and actually not realizing it?
Vasiliou: The short answer is yes. So when we asked the question what monitoring tools or practices are used today, they were presented with seven options, ranging from application, infrastructure, experience monitoring, AI ops, and it was the fifth lowest, meaning that almost 40 percent of respondents said they’d never used it in the first place. The second question we asked is very simply what is the received value from AI ops, where one was the low value, nine was the high, and the range evenly spanned the value scale that was presented in the survey. What was interesting is there were a subset of people who said they never used AI ops but that they still received value, which was an additional dimension we didn’t think about when writing the survey.
But it kind of brings me back to the essence of your question, which is are you user versus a consumer? And then go back to the previous comments about how big and broad this quote/unquote AI ops idea is, and there’s absolutely a propensity that people hear and think completely different things, unfortunately.
O’Hanlon: Interesting. Interesting. So one of the things that we’ve been talking about in the SRE space over the last couple years is really how the role is kind of morphing and changing along with the market in general and to address a lot of organizational needs as well. Are there any findings from this year’s study that kind of bear out how an SRE role is changing and even if it is changing?
Vasiliou: That will go back to what we were saying just a moment ago about mitigating the risk of their transformational activities. When you were asking the question, I’ll just cite the stat from our very first SRE report. We said that 85 percent of respondents said that in order to be successful in this role you have to commute into an office. So that got topsy-turvy turned upside down on its head.
O’Hanlon: Right. Yeah. [Laughs]
Vasiliou: But to get back to the essence of the question here, when we looked at the quantity of data sources and the prolific rise of the use of multiplatform providers, we sort of looked at this correlation between how SRE orgs are structured. Are you decentralized, are you centralized, or is there some type of hybrid in between? And then the decentralized piece was broken down even further by stack component, by platform component, or by business product or service. The vast majority of SREs said they are currently decentralized by business product or service, but I think the transformational piece here, which is something we’ll look at next year when we write the report, is because of the prolific rise of multiple third party and multiple same-service third party providers, there is a huge, gigantic spotlight to shift the primary dimension of org structure from product and service to a platform basis.
And we talk about it like this. If you’ve got 10 different teams assigned to 10 different products or services and one member from each of those SRE teams developing capabilities that happen to be the exact same capabilities in development by those other nine teams, that’s the scalability and efficiency _____ that we start to lead in and talk about in this year’s report. So I think the transformational piece will be the assignment of SRE development engineering resources from product and service as a primary dimension to a platform – for example, cloud, CDN, DNS, API – as the primary dimension where those capabilities will be normalized and developed. Even though underlying platform structures will be different, those capabilities will then be offered as part of, let’s call it, an internal marketplace for use by multiple teams to kind of get away from that concept of silos and being segmented by product or service. So a bit of a long answer but all contextual to the key part there, which is the trend and the transformational piece that we wrote about this year.
 O’Hanlon: Right. Yeah, well it really is food for thought there if you consider how the SRE role was established and how it’s kind of – if you go down that path, how it seems to be changing and evolving as the more technology is adopted and more things are moved out to the edge and the need for SREs at basically all points on the network if you think about it. So great stuff there. Well, Leo, I know we could talk about all of the findings and have a much larger conversation, but unfortunately that’s all the time that we have for this one. But I do want to thank you very much for going through some of the survey findings. If folks are interested in taking a look at the study itself, is it readily available?
Vasiliou: It is, it is. So it’s available at Catchpoint.com. Just head over to our learning and research section and enjoy the read. Charlene, one of the sentiments we wanted to leave readers with this year was this concept of baselining against Google’s definition of SRE, but we hereby declare by the power vested in us by absolutely nobody at all that you are hereby free to be an SRE.
O’Hanlon: Oh, wow. Just one more thing to add to my plate. [Laughs] Well Leo, thanks again. I really do appreciate your time, and it’s a pleasure talking to you. I’m sure that we’ll talk again in the future about future studies that you guys have going on. So, thanks again. Really appreciate it.
Vasiliou: You are quite welcome. Thank you for your time, Charlene.
O’Hanlon: Alright, everybody. Please stick around. We’ve got lost more TechStrong TV coming up, so stay tuned.
[End of Audio]