EP 10: Observability & Continuous Improvement

Observability at enterprise scale brings with it additional application life cycle management requirements. Success requires knowing which changes result in improvements and which don’t produce positive results and how that information must make it back to the DevOps team through feedback loops. In this episode of DevOps Unbound, Alan Shimel and Mitch Ashley talk with Kurt Chase, senior director, global release management at Tricentis, Charity Majors, co-founder at Honeycomb and Ravyn Manuel, senior application manager at the Smithsonian as they discuss observability and continuous improvement. The video is below, followed by a transcript of the conversation.

Alan Shimel: Hey, everyone. I’m Alan Shimel and welcome to another DevOps Unbound. We have a great panel in store for you today here on DevOps Unbound. Our topic is observability at a broad level. We’re going to jump into some of this. Let me first of all introduce you, though.

First I want to introduce you, of course, to my cohost of DevOps Unbound. He’s the CEO of Accelerated Strategies Group, CTO here at MediaOps, my friend Mitchell Ashley. Hey, Mitch. Welcome.

Mitch Ashley: Alan, thanks. Great to be here.

Shimel: All righty. Let me introduce you to our panel for today. We have a great panel. I was so happy that this came together. First, coming at us from the mountains of Virginia by way of the Bronx, Ravyn Manuel. Ravyn, why don’t you introduce yourself and where you work?

Ravyn Manuel: Absolutely. My name is Ravyn Manuel and I work for the National Museum of African American History and Culture, which is part of the Smithsonian. I am the Senior Applications Developer and DevOps Engineer. And thanks for not putting me in West Virginia. I appreciate that.

[Laughter]

Shimel: Okay. I don’t blame you. Next up we have the one and only Charity Majors, CTO and Co-founder at Honeycomb. Charity, a little bit about yourself, maybe introduce to the audience?

Charity Majors: I am an ops nerd. I have been on call since I was 17 racking hardware. I no longer rack hardware anymore. Hardware doesn’t exist, as far as I’m concerned. Instead, I’m trying to save the world through better observability.

Shimel: Excellent. And then, last but not least, we have Kurt Chase of Tricentis, Tricentis Releasing. Kurt, why don’t you introduce yourself?

Kurt Chase: Hi, everyone. My name is Kurt Chase. I’m currently Head of Global Release Management at Tricentis, based in Northern California. Nice to join you all today.

Shimel: Nice to have you here, Kurt. All right. So, team, panel, our topic this week is observability. And given that is the topic, I feel compelled to start with Charity because, Charity, in many ways you and the Honeycomb team are responsible for all this buzz around observability. Tell us about it.

Majors: Yeah, a little bit. When we started the company a little over five years ago it was not a term that was really in use – used in tech. Now – which is not to say that we invented it. Far from it. But it has this long and impressive pedigree in mechanical engineering, specifically control theory. And in control theory observability is the mathematical tool of controllability.

So, we started building this startup, Honeycomb, and we knew that what we were doing wasn’t monitoring. We knew that we weren’t reactively monitoring. We knew that we were – it was about instrumenting the code, getting at it from the inside. But for the first five, six months we couldn’t figure out what to call it. It was really hard to figure out how to talk about what we were doing. And then it was mid-July, midway through the first year when I remember Googling “What is the definition of observability?” And when I read it it’s all about “Can you understand what’s happening on the inside of the system just by looking at it from the outside with no prior knowledge?”

And a lightbulb just started going off. It was like “Ding, ding, ding. That’s what we’re trying to build. That’s exactly –” if you translate that into software terms, what it means is can you explain any system state that – any state that your system can get into without shipping new code to handle that state because shipping that code to handle that state implies that you had prior knowledge. So, it’s all about the unknown unknowns instead of the known unknowns. So, yeah, that’s my story.

Shimel: And you’re sticking with it. So –

Majors: Yeah. I mean, it really encapsulates a lot of the problems that people are increasingly having, that there’s a lot that precedes from that definition of “unknown unknowns,” like you need to have high cardinality, high dimensionality, blah, blah, blah, all this stuff. But those are the things that speak to the problems of microservices and multiple storage systems, et cetera.

Shimel: I get it. So, Ravyn, this is your life. How does this observability and what Charity has describe, the fact pattern or the problem pattern that she came to call observability, how does that in your life, day to day of what you’re doing, how does that kind of manifest itself?

Manuel: Thanks, Charity, for actually providing a definition because I was like “Okay, are we supposed to define our own meaning of observability?” So, you’ve given a baseline. And I like that you said it’s not monitoring because I can actually feel the nuance. There’s a nuance between monitoring and observability. And it’s – what’s really interesting is that that’s a problem that I’m trying to solve today at work because we’ve got – and – because we’ve got the black boxes. So, you’ve got observability but when you have software there’s a lot of black boxes. You’re not supposed to be asking what’s going on inside the code because you’re not supposed to know what’s going on inside the code. But when you get a – you put a signal in – I was in the military, so I did electronic repair – so, you put the signal in and you don’t get the right signal out but you’ve got a black box. How do you know what went wrong there if you can’t actually observe what’s going on inside of there?

It’s one thing to know the state of the software when you are the developer. So, as a developer, yes, I know. But when it comes to being the product owner or the project manager or a stakeholder and you’re trying to figure out what – “Why is this doing this?” or “This is not what I wanted,” actually is how that would come from that end, is “That’s not what I wanted.” So, how to get the stakeholders to get what they want when you can’t actually tell – I can’t tell what the code from my coworker is doing? So, in a way observability is a little bit – it is abstracted from monitoring because monitoring is telling you what it’s doing but it isn’t telling you the state. And so, it’s a problem that I’m trying to solve actually today.

Chase: I was going to jump in here and say from my experience although monitoring, logging, reporting, alerting, they’re not observability, I think they are fundamental foundational pieces that you have to have in place. You can start observability without any of them, but I’ve found that the more you can get your monitoring and reporting and even get some alerting going, it allows you to then start observing the system. And for me, I know at Splunk and now here again at Tricentis I’m really trying to connect all the pieces because it’s not just about the applications, but I find that some of the DevOps engineers, they really help with observability, especially the ones that are super inquisitive, and they don’t just – when a result appears or something happens they don’t just take it at face value. They’re inquisitive. They dig in. They try and find the root cause.

And I think that’s key with observability, is taking all the monitoring, reporting, alerting you have, taking the feedback from your customers, all of that comes together, I think, to really help you start to define those unknown unknowns that Charity brought up. I think it becomes very complicated. And then I think I’d love to talk about how AI and machine learning now can start to learn some of these things, because I think we get really good at repeatable incidences, observing those and then accounting for those in our systems, but it’s the unknown unknowns, how do we deal with those? That’s the real interesting part to me nowadays for sure.

Majors: It is, I think, the interesting part. And I also think that AI ops is mostly bullshit and it will be for the foreseeable future.

Chase: For a while. I agree, yeah.

Majors: But I wanted to circle back to what you said about logging and everything, which is – so, there’s a technical definition for observability which has to do with high cardinality, all this stuff, and then there is just understanding your systems. And I absolutely agree: You need some alerting, you need some monitoring. I think there’s a question is – of – it log files? Is it metrics? That’s kind of irrelevant for just data. What’s important is can you understand what’s happening in your system when you need to understand it?

And I do think it’s important to sort of differentiate the technical definition of observability from just the tendency that we have to, say, observing things because there’s a very rich and mature and well-established set of best practices for monitoring, which are in many cases completely mirror opposite from the best practices for observability.

For example, with monitoring it’s like for years we’ve been saying, “You should not have to stare at graphs all day. The system should inform you when you need to – when there’s something wrong.” Super canonical best practice. With observability it’s exactly the opposite. It’s – we’re saying that most problems that happen are so subtle that you need to be looking for them. You don’t – you can’t page everyone at the threshold – you would drive them nuts if you were like “Oh, every little thing is going wrong. Bing, bing, bing.” You’re just – you would drive people nuts and they would quit. But when you’re an engineer, when you’re working on this endpoint or when you’re trying to solve this problem you should absolutely be looking through, asking questions, iterating on them, and looking for the trails in your observability tool at that very fine grain level.

Chase: I agree.

Shimel: So, I hear what all three of you said and it’s funny how everyone brings their own baggage to this issue.

Majors: Oh, sorry.

Shimel: No, no, it’s not – don’t be sorry. I love it. Here’s, I think, fundamentally – for those people out here who are watching this whole – I’ve got a lot of people out here who are DevOps people, who are ops folks, SREs, cyber folks, and their heads are spinning because I think there’s a little bit of what Kurt said to them, which is they don’t do – they do a shitty job monitoring, quite frankly. We don’t do a great job of it. People started throwing this AI ops word around, and there are a lot of companies raising a –

Majors: Yeah, there are.

Shimel: – crap ton of money around their AI ops stuff. And I never really understood what was real about it and many times it just sounded like another word for monitoring to me. But whatever. There is no AI in it.

But – and now, look, over the last year, year and a half, two years we’ve seen observability come up here and they don’t get the difference. Ravyn, I think you do because quite frankly your black box analogy with electronic in the military, that is – you’ve got to figure out what the unknowns are from the unknowns. It’s not just getting all these signals that are known and then recognizing patterns and putting some automation behind it. That’s not observability. I’m sorry. Not in my mind, anyway.

So, I think number one is we need to clearly define it. And Charity, you guys, you are a major kind of evangelist for this: “What is observability versus AI ops versus ALM,” which is before AI ops, and so forth. And there are SREs who are living this and DevOps teams who are living this every single day out here who haven’t got their heads around it. Ravyn, I always look to you as someone who gets it. Where do you fit on the – where do you kind of see it on the – let’s say there’s a scale of ALM, AI ops, blah, blah, blah, observability. What do you think?

Manuel: I feel like I would be very – most close to observability because, again, all – anything with ops in it, when you start doing DevSecOps and DevOps, all of those ops just become marketing terms after a while. But observability is – because it’s not quite tangible it’s actually more real, because it’s real – because it speaks to exactly what’s coming, what’s going on when you’re trying to get continuous improvement, because you need metrics to show and prove that you’re doing continuous improvement. But how do you get there? What is it – what are the steps to get you there? How are you alerted that you are not improving or that you are going backwards?

And so, when – I feel for me as far as observability I’m at the cusp, because I didn’t actually know that the word existed until I found out about the talk. I didn’t know that that had been defined. So, it’s really cool to know that I can now call it observability. It’s because it’s that thing that I can’t – there isn’t a tool to actually help me – actually help me know what it is that I’m looking at all the time. Not all the time. There are some tools that help me some of the time but not all the time. And it’s the times that I don’t know that’s when we are stuck, because it could be impacting the way – the system performance or the team’s performance.

Shimel: Charity, I think you might have a customer there. It sounds like they need a tool.

Majors: I feel like one of the key differentiators here between observability and AI ops and stuff is what it tries to center. With observability we are not – AI ops are like “Eh, we’ll tell you what to look at. You don’t need to understand your system. We’ll understand your system for you. We’ll tell you what to look at. We’ll tell you what it means. We’ll tell you what’s important.” With observability it’s the opposite. I think that we need to look to solutions that let computers do what computers are good at and let humans do what humans are good at. Computers are good at crunching lots of numbers but they can attach meaning to something. Only human beings can attach meaning to something. If I see a big old spike of red that could be good, that could be bad. You don’t know. There is no meaning until a human has attached them to it.

And with AI ops, AI ops only really works if the corpus of data that you’re training it on doesn’t change. So, if you’re actually shipping software every day you’re fucking up your corpus. This – it’s kind of incompatible with modern software development. I have many thoughts and feelings on AI ops. God bless them, but mostly I think that what they’re trying to do is take the humans out of the driver’s seat. And I understand why, because the C-level, CTOs, CIOs, whatever – there’s something that kind of blew my mind when I realized it a couple years ago: They trust vendors over their own people. Their employees come and go; vendors are forever. So, whenever a vendor comes along it’s like you give them tens of millions of dollars; your people don’t have to hold all of the wisdom in their heads. You can replace them. They’re fungible. The tool will tell you; the tool will be the source of truth.

And I just fundamentally reject – I’ve never seen that work well. I think it’s a shitty way to treat your people. And I would prefer to take the –

Chase: And I would not agree with that statement, quite honestly. I’ve had leads that are –

[Crosstalk]

Shimel: All right. “Mr. Vice President, I’m talking.”

[Laughter]

Sorry.

[Crosstalk]

Majors: – and then using our tools to make them better, to impress them, to give them a better sense of where they sit in time and space in the history and what’s happened before and what they’re – they should be wearing grooves in the system as they wear – as they use it in the [inaudible].

So, I think that it should – if you’re centering a person it’s observability and if you’re centering the – it’s AI ops. Sorry. Go ahead, Kurt.

Shimel: It’s okay. Kurt, your turn.

Chase: No, I was just – in my experience the leads, I’ve had – I’m a senior level director now; it was the same at Splunk – I did not encounter that where my leads trusted vendors more than myself or the people on my team. I was surprised to hear that, quite honestly, Charity. I have not experienced that.

[Crosstalk]

Majors: It’s not universal. But it’s – there – it exists quite a lot.

Chase: Yeah, that’s too bad to hear that. That is seriously wrong and that’s going to go against everything we need to accomplish internally.

Majors: Yeah.

Chase: Yeah, if –

Ashley: Have fun attracting great people with that mental model.

Chase: Yeah, yeah, yeah. And the other thing – I’d love to hear your opinion on this, Charity – we have very complicated systems we’re running. The source code management, the build systems, the agents. I’ve found that it always is very helpful to distill it down into discrete parts, try not to blow the ocean at once. Let’s start with the source code management system. How are we observing that? What are we doing here? And would you suggest that’s a great way for people to get going? I mean, when you try and look at everything – you have to look at everything for sure, but to get started it seems like find some of the most important pieces. Source code management is a great one. The crown jewels. What’s your suggestion there for getting started and –?

Majors: Source code management is important but it’s never – it’s almost – it shouldn’t almost ever be in the critical production path. And when people are rolling something out I always advise them to find something that’s – don’t waste your time instrumenting _____ observability, lights, and backwater to the system. Find the thing that you’re trying to understand and fix now. Find the thing that’s broken. Find the thing that’s causing you pain. Because it should be an order of magnitude better at answering your question, observability should, even than monitoring. I don’t think you can look someone in the eye and ask them to switch up what they’re doing and to use something else unless you can tell them it’s going to be an order of magnitude better than what they’ve got now because the costs of transition are so high.

But with observability you should be able to go from instead of just the aggregate where it’s like “Well, I see a spike but I can’t tell you what it means or how they’re different from –” with observability you should be able to go exactly those rows, in which ways are those events different from all the baseline events, and very quickly go “Oh, I have a much richer view of my system now.” And if – once you’ve got that you should be able to explain way more about what’s going – so, we see this all the time when people are running out – when they’re rolling out Honeycomb. They start rolling – they roll it out to the service and they’re like “Oh, God, there are problems there. Did you know that this is broken? Oh, there –” and we’re like “Yeah, yeah, yeah, but you’re just picking up the rug. It’s been there – it’s been that way for as long as you know.” And they’ll pick up another service and the problems – I’m like “Yeah, but these problems have existed – you just had no idea because we were dealing with [inaudible].”

So, I feel like the way to pick up momentum, the way to incentivize people to get excited about this is to look for the biggest pain point in your system and –

Chase: That’s awesome.

Majors: – then patch that with observability.

Chase: That’s great advice. I agree.

Manuel: Charity, a question. Does – can observability actually take in account process? Because you’re talking about if there’s something wrong in the system itself, but sometimes the thing that’s wrong with the system is the process.

[Crosstalk]

Majors: Well, everybody is not about process. And you’re totally right. That’s often – that is not something that you should ever ignore. A lot of the biggest problems are process problems. But observability is not the tool for that.

Shimel: And that’s good to know too because quite frankly – and Mitchell knows this too – we live in a silver bullet world where people think the latest shiny trinket is the cure-all.

[Crosstalk]

Majors: – everything, you should –

[Crosstalk]

Shimel: No, observability, yeah, it’s not going to cure Covid either.

Majors: No.

Shimel: But –

Chase: And that was the point I was trying to make initially. I think I said it wrong. It’s not that monitoring and reporting, logging aren’t foundational; they’re complementary. You kind of need them all. If you’re running modern stacks – I said that incorrectly at the start – it’s not foundational. There’s – you need them alongside each other.

Shimel: They’re adjacent. They’re not –

Chase: Yeah, absolutely. Absolutely.

Shimel: They’re not – it’s not a pyramid per se.

Chase: Right. Exactly.

Shimel: I think it’s part of your stack horizontally.

Chase: Yes. Yes. And I’ve found –

Shimel: Can I – we haven’t’ – go ahead, Kurt, I’m sorry.

Chase: I was going to say I’ve found with especially new individuals and new people to the DevOps team and the DevOps world, getting them involved in the monitoring and saying, “Hey, this is how we start monitoring,” that’s a great place to start. It’s almost entry level for observability, if you – I’d love your opinion on that, Charity, too.

Majors: Well, I really think that everyone who writes and ships software should be on call for their code, no matter how –

[Laughter]

Chase: Yeah, me too.

[Crosstalk]

Majors: Software ownership. And so, yes, that’s very empowering, I think, to show them how to find the trail, the alert that’s paging them, trace it all the way to the system and fix it. That’s something that as engineers, we feed on that. That’s exciting to get to know how to run your ship better. Yes, absolutely.

Shimel: Well, but – so, you know what? Mitchell and I started a company – or, were two cofounders of a company called StillSecure 20 years ago. And that was one of the things. Mitchell ran the engineering team and that was one of the things he instituted, I remember, was that every developer had to take a turn on the help support desk, level one, answer the phone. “Hello, Mr. Customer.”

Majors: It builds empathy.

Shimel: And try to explain it because they had to answer the alert. And even when they weren’t on that level one our escalation procedure was if it went to level two or three it was the developer who did the level two and three support.

Majors: I will say that, yes, and I do think that every engineer should be on call for their work, but you have to couple that with it’s management’s job to make sure that that doesn’t suck, that it isn’t _____ for them, but it doesn’t just kill you with [inaudible]. It is management’s absolute fucking responsibility to make sure that they’re given the time to – away from project work to fix these problems so that they don’t recur, so that they get enough sleep. I think it’s reasonable to ask any engineer to be woken up two or three times a year for their code. That’s it. Because more than that, you’re burning people out hardcore.

Manuel: Been there, done that.

Shimel: You know, Charity, Ravyn –

[Crosstalk]

M : I have a question. I have a question I’d love to ask all of you. I’m thinking about observability has been elevated because of, in part, the world that we’re living in. Software architecture is changing. We’re dealing with microservices, all these much smaller things that are interconnected to make software. And at the same time as we’re changing the pace at which we release software, not only is it unknown unknowns, it’s like a river – Riven –

Shimel: Ravyn.

Ashley: – we’re in the river of the changing software. So, you’re trying to understand all of these things at the same time. It seems like it’s the right time for observability to be the thing to be working on because ops has to be elevated to the same level of the same kind of architecture and processes that we’re using across the – in the life cycle of software. Do you agree with that or not? It seems to me we’re in the right place, right time.

Majors: Absolutely. I mean, I feel like the first wave of DevOps transformation was all about “Ops people, learn to write code.” And we’re all like “Eh, okay.” But I feel like the pendulum has swung. I think now it’s about “Okay, software engineers, your turn.”

Ashley: You’ve got to do ops.

Majors: I have to learn to write operable services and to learn to maintain them and get called for them and – because what microservices has done, all of that complexity and logic has to be bundled up in the app. The app.

Manuel: Exactly.

Majors: And now it’s like we’ve been thrown to the four directions of the map. And what that means is that so much of understanding and operating your application has been pushed into the realm of operations and systems. It used to be that if all else failed you could attach a debugger to your app and just step through, see what was happening. Well, you can’t do that anymore now because the process died and jumped to another service, and another service. And part – the way that we instrument for observability is all about packing up the context of the application and shipping it along with it from hot to hot, one event per request per service, so that you can slice and dice and reason about your systems in a way that you used to be able to.

Ashley: Doing that really changes the design. I’m thinking about Alan’s story of what I did back at StillSecure. What prompted that was when a customer called in and said, “I’m sitting in – we’re using your project and this dialogue box popped up and it says ‘An unknown error occurred’ and that’s all it says. What do I do?” Well, nobody knows what to do with that. So, we all just said, “Look, everybody does support in that case. Everybody does ops.” And that changed: “Okay, we have to figure this out.” And suddenly that was the problem of “How do we make this easier to support and easier to resolve issues with?” So, it does affect behavior and design or product.

Majors: Yeah, absolutely.

Manuel: I think microservices is – will actually do good for observability because if you do microservices correctly then each service is atomic. It does one thing and one thing only. It does that thing good. So, if the unknown is not taken out of it because you know exactly what this thing is supposed to do –

[Crosstalk]

Majors: Oh, no, honey, the unknown is never taken out of it. The unknown is never –

[Crosstalk]

Manuel: In my ideal world it is.

[Laughter]

There are no more unknowns –

Shimel: In Ravyn’s world, yeah.

Manuel: – in Ravyn’s world. Exactly.

Shimel: Yeah. But I’m going to tell you something. The flipside to this, though, too with microservices, and I think we’re starting to see this now, is can you have too many microservices running? Does it – is the complexity load in there such that observability almost becomes meaningless or impossible?

[Crosstalk]

Majors: I don’t know that complexity can every really be created – no, it typically is not meaningless. It’s your only hope to understand this shit if that’s what you’ve done to yourself. But complexity can only be moved around. It’s just a question of tradeoffs, like which set of tradeoffs – some folks have gone whole hog on microservices and then they’ve found the edges to those design _____. But –

[Crosstalk]

Chase: What I find – I was going to say what I find challenging is both at Splunk – Autodesk was interesting. So, to your point, Alan, we used to make engineers buy bagels. You break things; you have to buy bagels. And so, we had a lot of bagels brought in. But both at Splunk and here at Tricentis we have a world where we’re supporting large enterprise applications and delivery of enterprise apps. We’re also – obviously all the microservices – and trying to apply these techniques across the board is very interesting, very challenging.

[Crosstalk]

Majors: Oh, yeah. There’s no one-size-fits-all.

Chase: There’s not. There’s not. And almost to the point you made, Ravyn, observability almost has to be customized for what you’re observing. It’s – I haven’t had a chance to run Honeycomb yet, but for me my experience has been there’s always those little idiosyncrasies about how we run our ops, how we –

[Crosstalk]

Majors: There’s no such thing as a perfect _____. No one else can do it for you. It can be – it can get you started. It can give you a boost. But at the end of the day somebody’s got to fucking understand your software and roll that into your instrumentation. Nobody else can do that for you, let alone _____.

Chase: And that’s where with AI ops, I won’t say it’s complete shit. I think there is some value there. Some of it is just about automating common occurrences. And maybe that’s not valuable but –

[Crosstalk]

Majors: – it’s an AI but they’re just using that to make – to raise lots of _____.

Chase: Oh, sure. Okay. Yeah.

Manuel: Didn’t work.

Majors: Like, “Oh, I have an AI too.”

Shimel: No, but we – and you’re right, AI is a money term. But ML ops is something that’s caught on a bit, more than, I think, than AI ops. And I think that’s really at the heart of it, which was to recognize repeatable patterns or recognize patterns that we can respond to.

[Crosstalk]

Majors: Although, I would still challenge – I think in far too many cases people are reaching for the complicated solution instead of fixing it at the start. They’re like “Ooh, I’ve got 50 bazillion e-mails so I – and all of that disk space so I need AI to solve it” instead of turning off the fucking e-mails and setting up a different kind of alert or some automatic _____. Just – is this really a problem we need to solve? Or could we solve it better by just not having it?

Shimel: There is that.

Majors: Sorry. That’s my last bitch.

Shimel: Do we need to solve every – and I think Ravyn – or, not Ravyn – Charity, you were getting at it before. Do we need to solve every problem?

Majors: Is it the most important problem? Is this going to be the biggest bang for our buck?

Shimel: Yep. I mean, we used to have a –

[Crosstalk]

Majors: – or the _____. Like, senior engineers write software but very senior engineers don’t need software.

Shimel: Yeah, no, I mean, look, when Mitchell and I were doing the security company we had a product, “Vulnerability Assessment of Management.” VAMP. Internally we called it the Bad News Generator because that’s what it was.

[Laughter]

It was the Bad News Generator. And the problem was we would go to an organization like Ravyn’s and it would pop up a few thousand vulnerabilities, and the security or the ops people would be like “What am I going to do?” And “Don’t worry about 95 percent of them. Let’s just worry about the top five percent.” And it was like – it was employment insurance, was what it was, because there would always be more vulnerabilities for us to find next year.

But that being said, it really was the case where you’re not going to fix everything. We don’t have the time to fix everything even if we wanted to.

[Crosstalk]

Majors: And it’s not really doing a great job of giving you differentiation, letting you know what’s really important and what’s just noise. So much of these tools, they just give you so much noise and it’s almost worse than nothing at all.

Shimel: Well, that’s why with the Bad News Generator people used to – so, Mitchell got the lucky part of designing it. I had to go sell it. That was much harder.

Ashley: I remember the –

Shimel: Much harder.

Ashley: – guy from the federal government called and said, “This thing is spamming everybody in the Defense Department.”

Shimel: Yes, “We have all these vulnerabilities.”

Ashley: “How do I turn it off?”

Shimel: Because we had this automated workflow before we had terms like AI and ML. But it is hard.

Hey, guys, we’re coming down the home stretch here. We probably only have five minutes or so left. I wanted to – let’s look forward now. I’m big into looking forward these days. I’m hoping we’re going to do conferences in person again; we’re going to – I don’t know if we’ll go back to our offices but…

What’s next for observability? How does this continue to evolve?

Majors: What’s next is I think really wrapping traces in as a first, best tool so that – right now too many people are like “I have one tool for metric, one tool for logs, one tool for servability, another tool for –” they’re spending just ____ data again, every time. And it’s worse than that because you can’t actually seamlessly break down a chain of events, so you’re just copy/pasting this idea into that tool or hoping despite – because it seems to line up timewise, it’s actually bees in the logs – you should be paying to store this data once and you should be able to flip back and forth between loads. Are you slicing and dicing? Are you doing aggregations at read time? Are you zooming in? Are you zooming out? Or are you slipping back so you can view it as a logger file by time?

I think that observability tools like us and Lightstep – which are the only two observability tools out there by my technical definition – although, I will say I do think that this might be the year that that begins to change. I know that so many of the big players have been working as fast as they can for years on the back end. They’re trying to catch up to where we sit technically faster than we can get catch up to where they sit in terms of the business. So…

Chase: And I –

[Crosstalk]

Manuel: I like that you said that. Sorry, Kurt. That –

Chase: No, no, go ahead, Ravyn.

Manuel: – Charity – I just wanted to say to Charity –

Chase: No, no, Ravyn, go ahead.

Manuel: – about tracing, because that’s actually the solution that I am actually looking for, that I’m looking for –

[Crosstalk]

Majors: It’s got a lot of rough edges.

Manuel: – for organizations. Yes, to be able to follow –

[Crosstalk]

Majors: But there are lots of problems we can’t solve any other way.

Manuel: Exactly. So, I feel good that I’m doing it inherently, like I know what I’m supposed to be doing.

[Crosstalk]

Majors: You’re on the right path.

Manuel: I think I’ve actually found a way to do that. But yeah, tracing I also think is the way to go. That is the only way you’re going to be able to know, because it’s taking you from point A to point B to point C to point D and you’ll be able to see that all. And there will be no unknowns. [Laughs]

Majors: Oh, honey. You keep saying that.

[Crosstalk]

Manuel: It’s my dream. That’s – let me live my little dream.

[Laughter]

Shimel: Kurt, what do you got?

Chase: Yeah, well, where mind – excuse me – where my mind goes to in the future is what are we doing in the education system to really disseminate this information, how monitoring software is developed, how we monitor it. And then also “observability” is such a loaded term because I know at Splunk they would consider observability something much different, for running a SOC and things like that.

[Crosstalk]

Majors: Oh, no.

Chase: So, I think it – I mean, at its core observability is what it is. But how it’s implemented for different disciplines and how they use it, I think companies would definitely take exceptions with some of the things we’ve said here –

Majors: For sure.

Chase: – and what observability means to them. And I would just say keep that in mind, that it is a huge field and I think we’re going to see further improvements. I love the tracing. And being able to have that in your tool set would be fantastic.

Shimel: Charity, final thoughts before we wrap?

Majors: Oh, I am just enjoying the fact that some people are so interested in this that they’ll show up and argue about it all the time. So…

[Laughter]

Shimel: It’s all good. Hey, you know what? It’s a living. Mitchell, as usual, I’m going to give you the last word.

Ashley: Well, I’ve learned two things today. Don’t call my tool an observability tool or an AI tool unless it really meets the definition, or I’m going to get crucified.

[Laughter]

No. No, what – I mean, I love this conversation. I learned a ton. I always do when we have these – what’s so cool to me is that we’re talking about operations and we’re talking about this part of our world and we’re not talking about better ticketing systems. We’re talking about really meaty, interesting things that help us do our jobs better, do a better job for the company, for the customer. That’s exciting to me. It’s just really cool. I enjoy it very much.

Shimel: Join us for our next show where we’ll be talking about better ticketing systems.

[Laughter]

No.

Chase: And what argument did I miss? I was –

[Crosstalk]

Shimel: No, no, no, no. It’s all good. You know what? I enjoy having really bright, smart people who are willing to express their opinions. And when I knew we had this panel, it was the lineup, I knew there was going to be no shortage of opinions, nor any backing down. So –

Ashley: We’re going to call the next one “The observability mix-up.”

Shimel: Right.

Majors: We’re going to have to call it “Observability AI.”

Chase: I still want to see that sticker –

Shimel: “AI Observability.” All right. Hey –

Chase: We need to see the sticker Charity was holding up. We need to see that.

Majors: Oh.

Shimel: Oh, we never got to that. Yes.

Chase: We have to see that.

Majors: I’ve got a few of those. I’ve got “Release := deploy.”

Shimel: You’ve got a lot of them here.

Majors: I’ve got “Ship dangerously.”

Chase: Yeah, I love that one.

Majors: “Deploy on Fridays.”

Shimel: You know what? You used to be able to get these things on conferences. Now we’re reduced to showing them on Zoom.

Majors: Yeah, you have to.

Chase: Yeah. Exactly.

[Crosstalk]

Majors: “Perhaps this is a good day to deploy.”

Chase: Maybe.

Shimel: Okay, yeah. Perhaps.

Chase: It’s Friday. It’s not a good day.

Manuel: No, Friday is not a good day.

Chase: That’s the one, right there.

Shimel: And there it is: “I deploy from Cron.”

Ashley: That’s the one to end on right there.

Shimel: That’s is. We’re going to wrap it. Hey, you’ve watched DevOps.com. It’s sponsored by Tricentis. Many thanks to them for sponsoring. Just wanted to quickly mention Kurt is from Tricentis. Charity is from Honeycomb, and it’s Honeycomb.io. Go check that out.

Majors: We have a very generous ____ here.

Shimel: A very free _____. Ravyn, you can start playing with it now and see if it’s something that gives you tracing and so forth. Mitchell, thanks for joining us.

Ashley: You bet.

Shimel: We are back – actually, we are back with a roundtable next, which will be open to the public. Looking forward to having you on for that. But until then, this is Alan Shimel for DevOps Unbound. Have a great day, everyone. We’ll see you soon.

[End of Audio]