Service Level Objectives (SLOs) are being more widely discussed as they provide a clear path to delivering exceptional results by defining clear reliability goals. Kit provides updates on the community, use cases and best practices. The video and a transcript of the conversation are below.
Recording: This is Digital Anarchist.
Alan Shimel: Hey everyone. Welcome to another Techstrong TV segment. I’m really happy to have my friend Kit Merker all done up in his Seattle Kraken hat and his winter beard for the pacific northwest up there joining us today. Ken of course is the COO of Nobl9, a recent winner of the devops dozen awards as best new company on the devops space. Hey Kit. Welcome to Techstrong TV.
Kit Merker: Hey Alan. Great to see you.
Shimel: It’s good to see you again my friend. I’m thinking Kit we may actually see each other in person soon. I’m hoping.
Merker: Wouldn’t that be great? Hopefully somewhere sunny and I can get rid of the beard.
Shimel: Well I’ll tell you. I was just looking at airfare for Valencia, Spain for Cube Con in May. I don’t know if you guys are planning on being there for that one, if that’s like a CNCF thing you go to or what or – actually we have a road show coming up in Seattle at some point. I’m not sure when. We’ll talk about that.
Merker: Well in May I’m not sure about Cube Con but we will be having SLO Con and we’re doing it online again this year in May so that’s what I’m looking forward to.
Shimel: Yes. I think I saw that. What are the dates?
Merker: It’s May 9th through 12th. And that will be completely online and asynchronous and attend while you work just like last year. So we’ll have tons of speakers and all the fun from last year that we had and hopefully even bigger. The CFP is open right now so if you go to slowconf.com you can submit a talk. And it’s going to be exciting. And you guys covered the, were a media sponsor last year so hopefully we’ll have you back.
Shimel: Yes, we were.
Merker: Not to put you on the spot.
Shimel: I remember. We’ll talk about it.
Merker: All right.
Shimel: So that works out well because I think Cube Con if I’m not mistaken is like May 16th.
Merker: Yeah. We intentionally didn’t put it the same dates so.
Shimel: Yeah. So that will work. Cool man. Let’s talk a little Nobl9 though. I guess we really shouldn’t. I’m assuming everyone knows who Nobl9 is. But I forget you’ve still only been around two, what, two, three years and at most. And not everyone may be familiar. Why don’t we back up and start there?
Merker: Sure.
Shimel: Tell people about Nobl9.
Merker: Of course. Yeah. So Nobl9 we make a service level objective platform, SLOs. And the whole idea of SLOs is about measuring your observability system and bringing new context, shared context so that you can make better decisions and better automation within your organization. So if you think about how you’re pulled in all these different directions, right? Some people on your team say we’ve got to ship features. Some people say if we don’t make it reliable and perfect then we’re going to hurt our reputation. If we don’t fix our tech deck we’re going to burn out all of our employees and we’ll have the great resignation to deal with. They’re all getting pulled in all these different directions.
And what’s interesting about this is that we have actually more data about our software than we’ve ever had before. We have source control. We’ve got our AWS Cloud. We’ve got our Kubernetes cluster. We’ve got all these observability tools. We have so much data and yet we come to these different conclusions about what we should work on. And this is a big problem for so many organizations. So what Nobl9 is doing is we’re bringing some additional context. This is what’s really missing is people need to understand in an organization what are the risks to the system. What are the expectations that customers have and what are the tradeoffs we have to make in order to deliver the right kind of service to our customers?
So what we’re doing with service level objectives in Nobl9 d is we collect all this observability data, we add this context and we drive automation and decisions so that whether you’re on call, you’re planning your future roadmap or trying to justify your investments you can see exactly what the impact is on the business and everybody can start pulling in the same direction. And that’s really what we’re doing. And like you said we’ve been around for a couple years. We now have a product in market. We both have kind of an entry level thing called Hydrogen, an enterprise class product that’s being used by some of the world’s largest enterprises and it’s very exciting and we’re having a great time.
Shimel: Very cool. Kit when we look over the evolution of the last couple years, right, ‘cause I remember with Nobl9, right, we first thought of it around well this is an SRE kind of thing. Right?
Merker: Yeah.
Shimel: And SREs early on were inherited a world of SLAs. Right? Where it was perform or a quick bullet to the back of the head will work.
Merker: Metaphorically speaking.
Shimel: But that wasn’t a very forgiving model or a model that would really kind of foster cooperation and team building and a lot of the principles behind devops, right, about how we get stuff done. It’s very hard to have a blameless postmortem when if you blew an SLA someone’s job is on the line. Right?
Merker: Well it’s really interesting because I think definitely in the sort of earlier I would say IT service model, right, where you have SLAs, especially if you have maybe outsource contracts. Right? SLAs became really a form of punishment. Right? One of the interesting things that I learned from sort of the Google SLA model is that there’s an assumption of reliability at Google, an assumption of infrastructure investment that’s not actually there in most enterprises. The SLOs that are prescribed by the SREs in the Google model are really trying to hold back over investment in infrastructure and this sort of gold plate infrastructure problem. Well most companies in the real world.
Shimel: It’s the other way.
Merker: It’s the other way around. Their issue is they can’t keep up with the reliability needs and the feature pressure is really the issue that they have. And so with this SLO model and being able to define them in fine grained ways across multiple data sources, being able to use it with automation. What changes is that you can start to put guardrails around these different conditions that people care about. You start to model the business. And this is really what we’ve got to get to. It’s about building the technology for digital business because all the businesses are digital now. Right?
Shimel: Right.
Merker: So you’re point about kind of like how you adopted sort of like devops method and bring this to the table, I mean really devops is trying to get better at building software, become more kind of diligent and efficient and make the right tradeoffs. It’s about ownership, right, a sense of responsibility but not punishment. And I think once people realize – actually I just had this customer from Flexera. We just did this testimonial video together. And he said it’s like once people realize that they weren’t building SLOs to point blame at them or punish them, that we were trying to do it to help them so that the people in the organization could see when they had problems, right, when they needed help.
Then people really got open to this idea. This to me this is the game changer. If we can help organizations become more transparent, to be able to do more through APIs instead of meetings if you know what I mean. Right? Like using the data and the systems to help make these better decisions. This is what changes culture. A lot of people want to talk about reliability culture and devops culture and agile culture but that’s such a scary idea. The way I like to think about this and what we’ve talked about with our customers and community members is it’s about how you make decisions together. Right?
And if you can make lots of decisions in a way that is supportive of the strategy, supportive of the business and also the realities of the pressures that we’re on as humans. And let’s not forget that everybody is going through a hard time and maybe not totally happy being on call and all these other things. Taking all that into account and making a decision together this is what becomes culture, not the other way around. Right? You don’t change culture first to get the decisions. You start making good decisions and then the culture emerges from that. And that’s I think a really important idea.
Shimel: I agree with you 100 percent. No doubt about it. And it is. But when we focus on devops being about delivering software and clearly that’s the goal. Right? Delivering software better, faster, higher quality. In order to do that you need just that whole culture thing. And some people say oh culture is BS. It’s not. Having a team that works together, having goals that are clearly enunciated, right, we’re all trying to do – you understand what the guy or gal next to you is doing. Right? And why they’re doing it and how you fit into it. That’s all part of that. All of those things help you get software better, faster, higher quality. And that’s where it is.
What’s interesting about Nobl9 Kit from where I sit, right, because I sit – I mean beyond being on different sides of the continent I sit in a different chair than you. I get to look at the whole market without a horse in the race so to speak. You guys have done a really great job of defining and owning SLOs. Right? I think for a lot of people in the marketplace, right, SLOs aren’t necessarily synonymous with Nobl9. But you’ve probably done more to promote SLOs and their use and why they’re beneficial than anybody else in market.
Merker: Well first of all I appreciate that a lot. Personally I think it’s a reflection of the community. I think it’s a reflection of a nerve that we hit at some level. Because to your point, right, goals that are well enunciated and it’s funny to me because I think oftentimes we think of goals as things that are kind of squishy. Right? We set stretch goals – oh sorry. I thought I muted everything.
Shimel: It’s never all muted.
Merker: We’re all muted. We want to set goals. Sometimes they’re a little squishy is what I was saying. So it’s like maybe you have OKRs. Maybe you have stretch goals. Maybe you have some different perception of what the goal is for your service. And what SLOs are doing is turning it into pretty much the most precise type of goal you could imagine which is code, right? Like actually describing it in code and something that’s checked in where it’s the truth. There’s no arguing about what it is. Now we can change it, right? We can always change our minds about what the goal is. But at any given point in time there is a goal that’s described and there’s a set of actions associated with that goal. And to me this is exactly what is so attractive to people in the market about SLOs because they’re no longer guessing.
They’re no longer sending each other Slacks and emails and writing documents and putting it in dashboards. They have a very precise way of describing what they want and expect from their service that takes in account the risks and tradeoffs and customer expectations that they have for the service and then allowing the team to engineer to that goal, allowing the business to understand the tradeoffs. Right? Hey. If you want the service to be faster we’re going to have to spend more to buy more computing power. Right? Hey. If you want these features we’re going to have to put these other reliability concerns on hold and vice versa. This is something that brings visibility to all the little engineering decisions that are made on a daily basis and then aggregates them up to the size of an entire enterprise ‘cause it’s birds eye view of how the enterprise functions and doesn’t limit them to a prescriptive way. ‘Cause I think this is one of the other big challenges.
When I look at the observability space or anybody’s infrastructure space is trying to bring a prescriptive guidance. I think today’s world is about the right tool for the job. Right? You’ve got to fit inside of an ecosystem whether that’s what language you use, what CICD system you use, what package manager you use, what source control you use. All these things are choices that engineering teams have to make for themselves. And you can’t necessarily dictate that Team A and Team B are going to agree on that stack if you will. Right? You give them some flexibility. But what you can get them to agree on is the concept of the service level objective. Like what should my service provide to customers in terms of availability, latency, performance, throughput, whatever the characteristics are that deliver that end to end service.
This is what becomes almost the contract because if I know that I’m using your service and you’re going to give me x number of latency over y number of nines then I can build my system to account for it. I can decide what my retry strategy is, what my caching strategy is. Right? I can design and engineer a system that takes into account what your service says it’s going to do. Well that level of clarity allows engineering teams to operate in a completely fundamentally different way because now they aren’t guessing and they’re not hoping and they’re not wishing. Right? They’re working toward a clearly articulated goal. And I think this is what’s so important. So now the whole idea of us promoting this concept. I mean we’re just like we’re in love with SLOs. I mean I don’t know how else to explain that. Like we’re just like losing our mind about how amazing it is.
But it is really about the community. We now have literally thousands of people that are involved in the SLO, the SLO Conf community. We have the monthly meetups. The conference as I mentioned will be in May. And we’re expecting hopefully about 5,000 people to show up this year. Last year we had 2,200 registrations. So we’re pushing hard for this but based on what we’ve seen in the market and the companies that are literally deciding if they’re going to build or buy an SLO platform right now.
There is a huge groundswell of interest in this concept to take their observability systems, their operation systems, reliability systems to the next level. And all the practices and tools and techniques and tricks everybody wants to share it and learn together which is I think the most exciting thing. And it fits in with so many other areas, incident response and chaos engineering and how we do monitoring and logs. All this stuff kind of fits into this whole story about SLOs. Even finops, right, even like cost management, all this stuff fits into this same story. So yeah. It’s kind of overwhelming to be honest how much the market seems to have appeared and materialized over the last couple of years with SLOs.
Shimel: Cool man. Hey Kit, we’re up on the clock here. So you mentioned SLO Con and SLOconf with an F dot com. Right?
Merker: That’s right.
Shimel: Right.
Merker: Yep.
Shimel: For people who want to maybe submit a talk or just find out more about attending this year’s virtual conference in early May. And then for people wanting to get more information on Nobl9 is it Nobl9 dot com or IO? I forget.
Merker: Nobl9.com. Yeah. N-O-B-L number 9, dot com.
Shimel: That’s it. Hey Kit, man. It’s great seeing you. I hope to see you in person sooner than later here. But keep doing what you’re doing man and congratulations on the devops dozen award as well.
Merker: Thank you so much Alan and hopefully we’ll catch a hockey game together.
Shimel: Come on down. Actually I don’t have Seattle Sounder tickets when they’re playing the Panthers but we’ll look into it. I’ll talk to you later man.
Merker: All right..
Shimel: Great seeing you.
Merker: All right. Take care Alan.
Shimel: All right. Bye bye.
Merker: Bye.
[End of Audio]