Blogs

What Does Observability Mean to You and How Does it Scale? – Observability at Scale TV EP 1

Observability is evolving quickly as a concept and practice, but what does observability really mean to you and your organization? For many DevOps teams it’s all about speeding MTTR across complex infrastructure and applications. For others, the notion of observability-driven development is being embraced as a strategic business initiative tied to customer success. Where does your team stand in this process, and what are the keys to successfully scaling observability to meet your specific requirements? Join us for this thought leadership discussion. What does observability mean to you, and how does it scale? Alan Shimel is joined by Mitch Ashley, Tomer Levy, co-founder and CEO of Logz.io, Helen Beal of DevOps Institute and Brian Dawson of Linux Foundation. The video and a transcript of the conversation are below.

Alan Shimel: Hey, everyone, I’m Alan Shimel, CEO of MediaOps, DevOps.com, Security Boulevard, Container Journal, and TechStrong TV. Welcome to a new series that we’re launching here on TechStrong called “Observability at Scale.” I’m really excited to launch this particular series, because to me, it deals with one of the, you know, one of the biggest issues, I think, facing dev ops, SREs, development teams, IT teams, in general, today. And that is what we call observability. Now, yes, ten different people, you’re gonna get ten different definitions of what observability is, and that’s kind of a double-edged sword, it’s a good thing and a bad thing, much like, there was no official definition of dev ops. However, it’s too big, no pun intended, too big an issue, too big a topic for us not to really be able to put our finger on.

And especially as it continues to scale, because scale is probably the biggest issue around observability. But before we jump into all of this, we’ve assembled what I think is an amazing panel for this first show, and I’m gonna introduce them to you here in just a moment. Before we do, I wanted to give a big shout-out to our friends at Logz.io who are sponsoring this series and making it happen. You know, we need to pay the bills, here, and we can’t do it without sponsors, so many thanks to our friends at Logz.io. Let me now introduce you to our panel. I’m gonna start off with my friend Brian Dawson. Brian’s a frequent contributor to TechStrong TV, has been for years, but I’m gonna let him kind of introduce himself. Brian, say hello.

Brian Dawson: Hello, Alan. Thanks for having me on. I’m Brian Dawson. I currently am the vice-president of developer relations and ecosystem development at the Linux Foundation. I came here by way of, ah, god, I hate to say _____, a multidecade career in software, as a software professional. Starting with Console Game Development, where we launched the PlayStation, but I spent about the past decade with a focus on optimizing software development and delivery via CICD and dev ops practices.

Alan Shimel: Fantastic. Thanks, Brian. Next up is our friend, the chief ambassador of the DevOps Institute [laughs], Helen Beal. Helen, welcome.

Helen Beal: It’s lovely to be here. Thank you, as always for the invitation. As you just said, I am chief ambassador of DevOps Institute. I’m also chair of the Value Stream Management Consortium. I came to all of this via many, many years of being in dev ops in ways of working culture, mainly, in Europe, as you can probably tell from the accent. I’m also, bizarrely, quite a lot in the Middle East, as well. And, yeah, delighted to be here amongst lots of people _____ _____ _____.

Alan Shimel: Very good, pleasure to have you, Helen. I didn’t recognize any accent. [Laughs] Our third panel member is Tomer Levy – if I mispronounced it, I know it’s Levy, Tomer? Or how do you pronounce Levy? Levy?

Tomer Levy: Levy, Levy, anything goes, Alan, for you, whatever you want.

Alan Shimel: Well, no, in New York, we say Levy, but I know in other parts of the world, it’s Levy, so. Anyway, Tomer is CEO, right, of Logz.io, but why don’t you give’em your background, Tomer.

Tomer Levy: No, absolutely, Alan, thank you, this is an exciting show. I’m Tomer Levy – that’s how I say it. You have to guess where the accent is from, but I’ll leave it to you guys to guess. If you say which accent [crosstalk].

Alan Shimel: Mm-hmm, what accent?

Tomer Levy: Exactly, what accent, exactly. Good, so I’m Tomer, I’m the CEO, one of the founders of Logz.io. We’re a cloud observability company, you know, hence the discussion and hence why we’re so excited about this topic. For me personally, I live in Israel, so that’s the accent. I am a second-time founder; I built a company before Logz.io. This company was around containers and, you know, a little bit before _____, we built a really cool container technology, and observability was a huge challenge. And when I moved to my next role, I said, you know, “How am I going to solve this?” and one day I saw this amazing opensource and I said, “I am going to make sure every developer in the world can use this opensource the same way I wanted to use it but could never do it. I had to use other tools _____ _____ which I didn’t like so much. So this is where I’m coming from, this is why I’m passionate about it, and I’m looking forward to this discussion.

Alan Shimel: Fantastic and welcome, and thank you for the sponsorship. Last but not least is my cohost of the show along with me, and it’s my friend Mitchell Ashley. Mitch, why don’t you introduce yourself.

Mitchell Ashley: Always good to be here and always good to be amongst friends. Thank you, everyone, for joining. My name is Mitch Ashley. I’m CEO of Accelerated Strategies Group, which is _____ _____ firm that’s focused on dev ops, cloud-native, cybersecurity, digital transformation. And I’m also CTO of MediaOps, where I work very closely with Alan and team, and appear on TechStrong TV on a number of areas. I’m both a product creator as well as a practitioner, CIO, kinda lived with it, so, implemented dev ops on IT teams and on product teams. So, I know the observability challenges well. [Laughs]

Alan Shimel: Absolutely. Thanks, and thanks for joining us, Mitchell. So, guys, as I mentioned, this is our first show in this series, and I think though we wanna get to the issue of scale, I think before we get to the issue of scale, we need to first define observability. We were talking off-camera, and Helen mentioned that they were chopping down a tree in front of her house out there in the UK, and was it interfering, did anyone hear it, was it a lot of noise. None of us heard it. So of course the issue raised, if no one hears the tree being chopped down at Helen’s house, has it really been chopped down. And, you know, I guess it’s a question of what we observe and what does observability show us.

But that’s not a great definition of observability. I’d like to – who wants to take a first crack and kind of saying, “Well, what exactly do we mean when we say ‘observability’?” Now, Tomer, I don’t wanna put any pressure on you, but you’re the CEO of the observability company, here, so, I’m gonna ask you to kick us off on that, and then Brian and Helen and Mitchell and I can jump in.

Tomer Levy: No, for sure, I mean, you know, it’s a good question, how do you define, because people define it as signals, they tell you what the software is doing, and it’s all true. For me, it’s Mitch without glasses is when you don’t observability, right? You take your glasses off and you’re half-blind, right? So, I know you can see well, but for me, it’s these new glasses that you look into software, right, software is there to run a business. It’s e-commerce, it’s gaming, whatever it is, if you have glasses which are really, really good, I imagine, like, you put it on and you see these, you know, especially with _____ architecture, like, hundreds of different cogs moving around. And just to find this one place where the cog is stuck or something in your business is failing, these are the glasses.

And people need these glasses today more than ever, right, with microservices, with modern architecture, with cloud, there are so many moving cogs, and observability, for me, is just a way to understand how is your business performing on a technical level. And because if some tree fell down inside your, you know, operation, no one cares, but if it impacts the business, then it’s important. That would be my definition.

Alan Shimel: I think that’s a good one. Helen, Brian, interested in [crosstalk]?

Helen Beal: [Crosstalk], for me, it’s a characteristic of a system that needs to be instrumented into a system, so I kind of go back to the mechanical engineering roots of it and the control theory. Also, with my dev ops background _____ _____ _____ it’s telemetry everywhere, so, this idea that everything is emitting lots of data, but with observability, I think it’s a more conscious effort in terms of what we’re instrumenting, what data we’re looking to emit. And then, the monitoring part is what we do when we observe our observable system. So, slightly different view, but I think we’re all trying to achieve the goals with this. Brian?

Brian Dawson: Yeah, well, and I think a key part of the sort of traditional control theory around observability is this concept that I can derive or identify the state of a system based on observing outputs. And I was thinking about it earlier, and seeing it as akin to these Ring cameras that everybody has, right. And, you know, initially, it’s just put up a video camera, see what’s going on around my home. That itself does not provide me, necessarily, with observability. I can observe an aspect, but now imagine, as I start to get 360 cameras around my house, and water sensors, and a smart thermostat, right, I’m establishing a number of outputs that allow me to observe the state of my home: is it empty, is somebody approaching it, what is the temperature. Then I think another layer is when we talk about integrating and extending it to provide for action, right? So when we look at another layer of observability that we get with modern smart home systems is I can not only observe that it’s too hot, but I can respond and quickly resolve that issue to bring my home, the system, into a state that I’d like it to be in.

Alan Shimel: Fair, good stuff. So, here’s my issue, right, I consider myself a businessperson, right? I know a lot about technology, but I’m not a developer, I’m not an SRE. Truth be told, I’m a half-assed security person, even. [Laughs] But I know enough about this stuff to be dangerous, right? And if I can’t understand what it is that something’s supposed to do and put my finger on it, it’s very hard for me. For our audience out here, guys, right, where does the rubber meet the road on observability?

Helen Beal: [Crosstalk] start with a crack on that one. 

Alan Shimel: Go ahead, go ahead. [Laughter]

Helen Beal: So, we are in a digitally disrupted world, so most organizations are trying to transform to take advantage of the opportunities that the Internet provides us, and that means that they need to leverage a couple of things. They need to leverage dev ops and they need to leverage cloud. Once they start leveraging cloud, they create distributed systems, so we’re moving _____ _____ to microservices to a much more complex environment. So this is one of the issues of scale. I think we’re gonna talk about a different issue of scale, as well. But once we’ve got those microservices environments, we can then apply dev ops principles, because we’re able to build, test, and deploy in very small pieces, so we can go much faster. But there’s always a tradeoff, right, and the tradeoff here is that we’ve now got thousands, maybe even more than thousands, tens of thousands, millions, perhaps, of little bits of things, microservices that we need to manage.

And therein lies the problem, that now all our traditional monitoring systems don’t really work, because we’ve got all this other stuff to look at. And then, from the business perspective, okay, it’s complicated, but why is it a real problem? Well, it’s a problem when something breaks, ’cause if something breaks, it’s unplanned work, and if we’re doing unplanned work, that means we’re not doing the planned work we want to do, so we’re not innovating. So if we can fix the problems around unplanned work, we are able to become a higher-performing organization, because we are able to deliver more value to our customers. We can, you know, spend more time on technical _____, we can spend more time learning, and become a dynamic learning organization and take that way.

There’s a whole other bunch of use cases, by the way, around observability, that aren’t about debugging and instant management, which we’ll probably come to, as well. But that’s my view on why this is important and people should be trying to invest in building an observability culture and figuring out what tools they need to implement to support that goal.

Mitchell Ashley: And I’m gonna really build on that. Helen, I very much agree that it’s, our software architecture is cloud-native, that initially sort of kicked off the observability, how do I manage thousands, maybe millions of bits _____ objects containers with code in it. I think it’s, observability, to me, is sort of an onion, and, yes, it’s managing the complexity of our software architectures like cloud-native, but it’s a lot more than that, ’cause we have infrastructures code, we have multi-cloud, we have distributed applications, we have dynamic environments where software is ephemeral. Those clusters may be running containers _____ _____ now gone, and another five seconds later, how do we know what happened. _____ _____ _____ to get back to Brian’s point about what was the state of something that might have caused an error, so we can get to root cause analysis very quickly. I think, ultimately, to Helen’s point, also, is it’s our ability to operate at speed, just like we can develop software at speed. ‘Cause if we can’t operate it at speed, you know, all we’re doing is building really fast cars and smashing into brick walls. That’s not fun for anybody. [Laughs] So I think that’s what really, you know, if I had to kind of say, I think the most important thing that observability has to do is match and support the pace of the business, and at the _____ _____ rate, as well as the rate we create software.

Tomer Levy: Yeah, Mitch, I completely agree. If I can piggyback on that, you know, companies today cannot compete if you’re not delivering code fast to production, if you’re not delivering changes, you’re just gonna lose. So there is this tension of, “I have to deploy fast, I need to make my engineering team independent, not fearful of making changes,” whether I’m a Pizza Hut or I’m a Walmart, I have to go [snaps] really fast. And on the other side of it, more and more portions of your business relies, you know, on online to work well. So if you deploy really fast [snaps], you might break the entire business. So there’s enormous risk and enormous opportunity, and I think, in a way, observability is that bridge that connects the risk and the opportunity, right? Because if you have an amazing level of visibility with these glasses, that see the microservices, that see all this complexity, and tell you, “Hey, you just broke something, but you can fix it in 60 seconds before anyone knows.”

And then, it’s amazing, and this is where it works amazingly well and it’s really hard to get there, it’s an ongoing challenge, because the complexity explodes, Helen, as you said, microservices, serverless, this enormous amount of data and telemetry is just, it becomes a huge data problem rather than – you know, and someone really smart comes and tries to search the metrics _____ _____ _____ it’s really, it’s more of a machine problem than a human problem, and this is where I think the world is going to.

Brian Dawson: Well, you know, and I’d love to add, I really actually see observability, and speaking as somebody that has asked people, oftentime, “What’s the difference between observability and monitoring?” you know, that said, as my understanding progressed, which I’d love for somebody to comment on, I’ve really started to get bought into the fact that observability is a key manifestation of the contract between dev and ops that a dev ops culture looks to drive or speak to, right. We have tooling technology, at another level process and practice, and also maybe we can talk about kind of the culture around observability. But effectively, building a real communication stream to manifest the construct between developer and operations, that enables us to move fast, right, and all have sort of a shared set of data.

Helen Beal: I think that’s a really important point that you’re making, as well, that the observability isn’t just for IT ops. It’s for everybody in the value stream, so we have ODD for developers, we have the concept we’re testing, that testability is equivalent _____ _____ to observability. We have people like product owners that will understand about the value, which is a slightly use case that I kind of alluded to earlier, and that observability allows us to understand whether our system’s working, but it also allows us to understand customer behavior, which is a very kind of interesting value realization use case. So, yeah, I completely agree, observability is a culture for everybody to be involved in.

Alan Shimel: Agreed. So, Brian, you opened a can of worms for me [crosstalk].

Brian Dawson: Good, I like to do that, you’re welcome.

Alan Shimel: Yeah, again, you know, we look at it from a business point of view, when I first started DevOps.com, right, one of the important things was APM, right? App Dynamics was a big dev ops company, and they were all about APM. And then, good friend of mine, Dave Link, the CEO at Science Logic said, “Nah, APM – ” he told me, one day, “Alan, APM’s done. It’s AI ops. AI ops is what it’s about, right, and that’s replacing APM.” And, you know, and we’ve had this evolution of names and, you know, three-letter acronyms to describe, as Brian says, that intersection of dev and ops, to describe what to me is to observe, right? To observe and then to act on the data that, you know, the data stream that I’m getting from my applications, from the software I’m developing.

So to me, at a very elementary level, that’s observability. I’m not sure where it fits in, though, to monitoring and performance management, and AI ops, and all of these things. Is observability the replacement term and the replacement for all of these things [crosstalk] something different? Yeah, is it an umbrella or is it –

Brian Dawson: I think that’s a good discussion, you know, what is it, is it a component of it? Is it now an umbrella term, for all of the types of monitoring kind of activities, data [crosstalk]?

Helen Beal: [Crosstalk] I was just gonna say, so one of our other work family, if you like, Eveline Oehrlich, who’s actually the research officer at DevOps Institute, she also does research and research in action. She recently delivered a new vendor matrix to the market, which was virtually an AI ops matrix, and it became an AIPA, so, a predictive analytics. So she’s looking to rename this whole market. And if you take a close look at the report, you’ll see she’s got an evolution curve. She goes from APM through AI ops and up into observability, so, she’s definitely heading in that direction. And some of the reasons that she wanted to move from AI ops to AIPA is that she sees these other use cases. So she’s seeing that there isn’t just instant management; _____ is customer experience and lots of other things, that aren’t just to do whether a system is performing properly.

I think some of the movement from APM is that, when we think about observability, it isn’t just about the application; it is about the infrastructure and the network, as well. So it’s kind of like everything in the observability bucket, if you like. Just one more note on the predictive analytics that might be worth chatting about, as well. I do quite a lot of work in the AIOps space, and that was, of course, named Will Cappelli, then at Gartner, back in 2015. And that kind of really put it into the instant management slot. And when we talk about that, we then kind of end up talking about higher levels of capability around things like self-healing and predictability. And we always have this conversation about predictability, about whether we should be able to predict, because if we can predict, then we should’ve known about it and fixed it already.

And I think this is gonna lead us further into the AI conversation, because part of what we’re doing with observability is trying to get to our unknown unknowns. So, in traditional monitoring, we had a pretty good idea about what was going on in our monolith, but in our distributed environment, there are lots of things happening that we just haven’t got a clue about. And this is where observability, particularly, the AI part of observability really helps. I don’t particularly like the term “AI ops,” ’cause I think it really narrows us into the ops side of the house, which for me isn’t very inclusive of everybody in the technology and business team. I’ll stop there.

Alan Shimel: Tomer?

Tomer Levy: No, for me it’s, you know, terminology is secondary, and personally, I’m not actually such an expert in that. For me, you mentioned the contract between dev and ops and – and the contract is not just, you know, what measures are you going to use, but this should be a part of this, this should work. What you’re building is gonna work, it’s gonna perform, it’s gonna be, of course, traceable, and it’s gonna generate the right metrics, it’s gonna generate the laws. So, someone on the other side, or you, preferably, can understand how your software works. For me, observability is about accountability, it’s about independence, it’s about unleashing engineering team to be – just to run forward and do things and without kind of, “Okay, I’m gonna take this piece of software, I’m gonna give it to someone else, and I’m gonna wish him all the best and have a great night supporting this software.”

No, no, no, you own it, you build it, you run it, and you generate everything you need to run. I can give an example: our engineering team, they cannot push anything to production without metrics and laws and _____ without _____ _____ _____ before they write the code. As they write, before they deploy, it has to be read, right? Then you push it and then you’re done. You know, it’s gonna be reported, you know how it’s gonna perform, you know, the business impact of it. It’s pretty amazing the impact of that.

Alan Shimel: [Crosstalk] I’m sorry, go ahead, Brian.

Brian Dawson: Well, no, I was just gonna say, you know, an opportunity to disagree, but I think it’s an important topic. I know we often, you know, talk about terminology and how we name things doesn’t matter. I’ve actually found, if you look at our industry and what we invest in it, oftentimes, where we go in and we play to own given terms in the market and we create word soup, there’s a real dollar cost behind that. As people, I’d say, sort of chase red herrings, right? “Well, AI ops is what I need to do to have successful development and stable systems. I’m gonna go figure out what that is and tackle it.” Tomer, I agree, it’s not the end all, be all. At the end of the day, it’s the core principle in practice, but I do think discussions like these that we’re having, “What is APM versus observability? Where does it fit?” enables us to establish a common language, so we can start to share and propagate these practices, you know, across different companies, industries, departments, and teams within our orgs. Anyway, sorry, Mitch, I stepped on you.

Alan Shimel: No, Brian, that was excellent. I’m sorry, go ahead, Mitch.

Mitchell Ashley: I think you were spot on, ’cause you were playing into what I was gonna say, and kind of speaking as an analyst. Analysts love to create categories and have people _____ _____ the defined categories, so, you know, there’s the person who created whatever category. And so, to early entrance of companies in those markets, I think there’s a bit of the conqueror writes history, right, as the dominant players tend to help establish what those terms are. But what I would say is I think the fluidity of these terms represents both the pace of innovation and just how much innovation is happening, as we get more and more cloud-native applications into production, as we do more and more cloud, multi-cloud, hybrid cloud, all of that, software, infrastructures, code, I think that’s what’s driving these questions of how do we manage all of this, how do we operate all of this, how do we use this information to inform the business about what’s happening. So we can know that we’re achieving our objectives for the customer, for revenue, for whatever it might be.

But I think this sort of word soup is gonna be with us for a while, while we still kind of sort out getting more cloud-native types of applications into the cloud.

Brian Dawson: By the way, I’m trying to stake a miners claim, what is that bad analogy, to DevNL. That’ll be the opposite side of, the counter to AIO.

Mitchell Ashley: Oh, that means you registered the domain, is that what that – [Laughs]

Brian Dawson: Yeah, I’m squatting, baby, I’m squatting [laughter] [crosstalk].

Alan Shimel: Right, you know, and it’s all good [crosstalk]. Interesting stuff, folks. We’re way more than halfway past, you know, our time limit, here. And there was another, on the abstract, for those watching, you maybe read the abstract for this show and said, “Oh, I wanted to talk about scale, I wanted to hear about scale.” Let us talk a little bit about scale, but before we do, folks watching this, look, this is gonna be a series, we’re gonna do this I think it’s every-other-week or so. And, I mean, I want to explore the topics of how opensource really has been driving, the driving force behind observability. How some of the opensource foundations, you know, Brian’s with Linux Foundation, how they’ve fostered this growth and helped create this market and opportunity.

And also, finally, you know, what it means for you out here, right? Observability can be your best friend or your worst enemy, in some cases, and, you know, we want to help you make sense of that, as well. But let’s talk about scale, now. So, guys, and gals, in my mind, you know, what’s the single-biggest inhibitor to successful observability scale, right? The datasets we’re talking about, just the sheer mass of bits and bytes that we need to wrap our heads and hands and programs around, in order to at least, you know, to have that data feed of which then we have to, you know, do analysis and hopefully come up with something actionable. I think that’s what’s held, you know, up to this point, it’s been a real issue. Tomer, I have to assume that it’s one of the reasons you said, “Hey, we need a solution, here.”

Tomer Levy: Absolutely. I think scale has so many different factors. There’s obviously the scale of the amount of data, right? If you take a monolithic application, you know, line by line, and it’s creating, you generate some logs, and maybe pick up some metrics. Or you take a monolithic application and move it into 50 microservices, now they all need to report where they are, what do they do. It’s almost 50 times more data, so the amount of data is enormous. Now, we used to be fine with just logs. Then, you know, the app D came to the world and said, “Hey, let’s do some tracing.” Now, you multiply this traceability by 50, so you have _____ data, and you have traces, and you have metrics to report kind of meaningful events, and now you have to collect all this telemetry into one place, and the sheer volumes are just ridiculous.

I see customers and they’re telling us, “Hey, I’m connecting, you know, 300 terabytes, a day, of telemetry data,” and you know it’s 99.9 percent useless. But there is, like the tree that makes noise, there is a bit of signal there that is important, and they really need it. How do you get to it? So there’s the scale of data, there’s the scale of the types of data, used to be only logs, and then it goes to different types of data that need to be merged together. I think there was a scale of human capacity, right, how can humans actually understand machine-generated data. _____ so much data, so much volatility, you know, so many different messages and types, just, they’re so smart, these engineers, but it’s really impossible to process trillions of messages.

So, this is where you’re trying to solve that with, you know, AI, machine learning, and _____ _____ _____ technologies, and we do the same. We solve the scale problem, we solve the different types of data problem, and try to solve the intelligence with predictive technologies. And I think there is one that people don’t talk a lot about, which is the scale of teams and people within the organization. Now, you go to an organization and it’s not 1 company; it’s like 50 different organizations within the same company. They do app number one, app number two, they do, you know, PI, they do whatever. Certainly, every one of them need observability, and someone needs a whole enterprise observability.

So, this is a huge _____ that we have actually _____ focused a lot beyond the normal, “How do we level an enterprise?” to really take all these pieces and _____ continue to work together, independently, unleashing productivity, and then the entire organization working collaboratively. So these are the kind of scale that I’m thinking about.

Brian Dawson: And if I may quickly, Tomer, just, you know, when we talk about the scale of teams in organizations, right, now you look within that as we look at the scale at which we’re delivering cloud-native applications across multiple containers, across multiple sort of agile pizza-size teams, more and more sort of supporting the micro-architecture approach, right. We have multiple teams delivering at speed to their component or their area of an overall system, which oftentimes a customer, the end user experiencing it views it as one system, right? And that sort of underscores the need, as you said, Logz.io is providing the ability for those multiple teams, those multiple areas of an organization to have a shared view, so they can quickly observe, earn, and respond, to provide a unified customer experience, right? So just, as we scale across teams, it gets harder and harder to provide a holistic experience to those end users. Observability helps enable that.

Mitch Ashley: Yeah, I think it’s also, another way to look at this is, who is working on this problem? And to be able to scale. And I think one of the healthy things about this part of our industry is there’s a number of opensource projects that are part of consortiums like the Linux Foundation and elsewhere. But I know, you know, that Logz.io is involved with Prometheus, as a number of companies are, and there’s a lot of ecosystem of other opensource projects that have split up kind of alongside Prometheus. There’s also, of course, Jaeger and OpenTelemetry, so, different folks are working on this problem in different ways. And what I like about – one of the main things I like about opensource is it brings people together, and that’s what Linux Foundation obviously does, and CNCF, and folks like that.

This is a problem that no one person, no one company alone can solve. You could take a leadership role in solving it, you know, like Tomer and Logz.io _____, but that, to me, that’s a positive indicator of where we’re headed. And which one of those or multiple of those or win or lose, it’s not necessarily the questions; it’s, we will figure it out and it will come to fruition. And it’ll be able to be something that can be shared across the industry, at least in the opensource form.

Brian Dawson: Yeah, and I’m sorry to jump back in, Mitch, but I love what you called out, there, in regards to opensource and Linux Foundation. And what we talk about a lot is, you know, we talked about this word soup, which is oftentimes not only analysts looking to define markets, but vendors. There’s revenue attached to defining market and owning terms, and I know while at the end of the day we’re all going into this with good intention, kind of the natural economics of business risks, someone trying to stake claim or ownership to a shared and important concept in their space. So, what I do think is important, whether it’s Prometheus, Linux Foundation, or other foundations, is that we ensure kind of some unbiased, an unbiased impartial playing field, right? For people to align around and establish sort of the concepts and frameworks, which we can all then go build on as end users, practitioners, vendors, et cetera. So I love the fact that I get to be in an organization that is playing a hand in building those impartial playing fields around spaces like observability.

Tomer Levy: If I can say one more word, here, which I agree, solving such a huge problem as one company, it’s almost impossible. You know, there are some giants who are able to do it, but we’re true believers in opensource, we’re, you know, contributing to Jaeger, we’re working a lot with Prometheus, it’s part of our offering, we’re contributing to OpenTelemetry, to OpenSearch, and other projects. And this is what we – we literally embraced opensource to say, “Hey, this is the way the community’s going to solve it. We’re, you know, as a commercial organization going to contribute and make it better and more accessible.” And I believe forward, just the sheer power of the community will help bring it to becoming much more common technology that hopefully _____ will be able to adopt more easily.

Alan Shimel: I hope so. Helen, we haven’t heard from you in a little bit. Thoughts on this?observa

Helen Beal: _____ just making me remember a presentation I did _____ about a year ago now, about community looking at things like the [crosstalk] _____ _____ talk about industrial revolutions and where we are. So, we’re on the precipice of this golden age and moving into this very sustainable way of work, and I think there is a mind shift _____ _____ it’s like a global human mind shift, that we’re moving away from thinking about shareholder value and really understanding that we’re working together for the good of the planet. And maybe that’s a little bit too highfaluting, but I do think, you know, I’m doing work as a consultant _____ _____ _____ _____ Foundation. I think there’s, increasingly, this understanding that we need to cooperate in order to do the best work.

Alan Shimel: I agree, I mean, I think that’s been one of the catalyst or gatekeepers of this if you wanna call it golden age of software development and digital adoption has been the coopetition, right? The coopetition aspect of industry joining hands with potential competitors, for the common good, of the whole world, and in the industry itself. And I think you’re dead on, Helen, we could look at value stream management, for instance, right, where we are measuring the flow, we’re examining and analyzing the flow. And based upon the flow of that data, you know, based upon the data that we get from this flow, we make decisions on how to do better. You know, contrast that or compare it to observability, right? Where does one leave off and the other pick up?

Helen Beal: Yeah, I mean, they’re very much together. Actually, in the value stream management foundation _____ consortium is gonna be releasing _____ _____ partnership _____ _____ _____ _____ towards the end of the month. There is a whole article from I think it’s _____ Nelson, which is about observability being the BSM superpower. ‘Cause it is about _____ to look into our invisible pipeline, and this is – it feels like we’re in some kind of like _____, ’cause value stream management’s been around for a really long time, and, you know, it comes from the 1950s _____ _____. And yet, here we are again, but there’s a big difference, because we’re now talking about digital value streams. And they’ve always been particularly challenging, because we make them out of code, which is pretty much invisible, and we can put some _____ on top and things, and dashboards, but there’s a lot of stuff we can’t see in software.

So, observability really helps us, not just _____ _____ when something’s broken, to fix it, but we can apply it to our whole value stream, look at our entire dev ops tool chain, and understand how _____ _____ and how well that’s flowing. I keep on talking about the use case, which is about customer experience, which, to me, there’s a duality in value stream management. It is about flow, but it’s also about value realization. So we wanna know how fast we’re getting that value to our customer, but we also want to know how they’re experiencing it. Is it good? Is it bad? Do they like it? Do they want more of it? And that’s another thing that observability can really help us with.

_____ _____ as we talked about, there’s so much data out there that we have a limited _____ _____ ’cause _____ _____ it’s human, so we do need to use our machines against our machines or with our machines, so turning them in on themselves, so using our AI machines to understand our data, and observability machines. And, yeah, I’ve made it sound very simple, so, why isn’t everyone doing it? [Crosstalk]

Mitchell Ashley: [Crosstalk] [Laughter]

Alan Shimel: Guys, we’re coming up on the end of our time. You know, I feel like for every one topic we are broaching, it’s revealing two, three, four topics that we need to dive deeper in on our future episodes in this series. So we have a lot, a lot to cover in this observability area, but we’re gonna need to wrap up today. I wanted to give each of you a chance, you know, maybe your final thoughts on today’s episode, you know, on topic. And, you know, give someone, people watching, out here, a reason why they wanna come back and get more of this. Tomer, you went last, I believe, so I’m gonna ask you to go first. You went last in the beginning of the show; we’re gonna ask you to go first.

Tomer Levy: Sure, yes, you’re so generous, thank you. So, maybe I’ll start with a quick – because I didn’t really say what Logz.io is doing, so I have to do it, otherwise, my boss is gonna be upset with me. 

Alan Shimel: Okay. [Laughs]

Tomer Levy: We’re a cloud observability company. We help companies monitor cloud. We offer metrics tracing and log management in one solution, includes _____ and includes everything for observability. The way we _____ differentiate _____ _____ opensource, actually, as I mentioned, our solutions are based on Grafana, on ELK, on Jaeger, so it’s a fully opensource-compatible solution, delivered from the cloud. One of the biggest advantages is, around AI machine learning, we’ve built some really cool technologies, crowdsourcing, machine learning, and bringing together. This is why we’re excited about observability. We have more than 1,000 paying customers, some of the largest companies in the world. And I’m excited for the rest of this series, so I think we covered a lot. I think scale is just a big problem, but there are other aspects of observability I’m excited to talk about. So thank you for having me, Alan, and it was nice to spend time [crosstalk].

Alan Shimel: My pleasure, it’s my pleasure to have you on, Tomer. Thank you. It’s good to even see you. Helen, you’re the lady in the middle, it seems, so, why don’t you go next.

Helen Beal: [Crosstalk] the rose between thorns or something. Yeah, so I’m gonna perform the sin of talking about word soup. I do think, my entire career, that the IT industry has its own jargon, but I think most industries do. If you went and looked in healthcare, I think you’d find a lot of TNAs and things, there, as well. But as Brian said, without language, we can’t communicate, so we do need to name things. But one of the things we didn’t really talk about today, which I’ve been thinking about a lot, lately, is TDD, BDD, HDD, IDD, ODD. So we’ve got all these DDs, so I started calling them XDD. So just to try and make sense of what I’ve just said, we’ve got task-driven development, behavior-driven development, hypothesis-driven development, impact-driven development emerging, and of course observability-driven development emerging, as well.

And I don’t think they’re mutually exclusive, but I’d love to have a chat, in the future, with all of you in this type of forum or separate, about how to connect all these XDDs and make sense of them, to help that contract between development and operations that we were talking about, earlier. And thank you [crosstalk], as always.

Alan Shimel: Thank you, and there’s always an invite here for you, Helen, you know that. Brian, do you have anything to say?

Brian Dawson: Yeah,

being facetious. [Laughter] [Crosstalk] it’s early, went completely over my head, hm. No, you know, I think an interesting topic to explore, as we kind of hit on here, is gonna be the common lineage between some of these things that become part of word soup: agile, continuous integration, continuous delivery, dev ops, observability. Which, you know, at the core, they really all are about control loops and feedback systems, right? In order to manage, state, and achieve a certain level of determinism for the systems that we’re delivering, I think it would be really interesting to talk about where those overlap and where we could uncover common best practices, as we sort of strive to implement these things. And then lastly, you know, a big of a plug is, look, I think as we embark, as we identify these new spaces, these new practices, and we embark on growing and adopting them, it is important that we find a place to meet on common ground, and if we – what did you call it, competitors?

[Crosstalk] Coopetition, excuse me, it’s still early. Right? So, do take a chance to look into the opensource solutions that Tomer spoke about, do take a chance to engage with the community around those spaces, so you can not only learn about what these things are, but so you can contribute. And Alan, hopefully, we’ll do some of that here, on this channel, in incoming weeks.

Alan Shimel: Absolutely, Brian, and the same as I said to Helen, always an open invite to come join us and contribute and help out on this. Mitchell, you wanna bring it home?

Mitchell Ashley: Yeah, happy to, Alan. You know, it’s such an emerging and fast-moving field, what it makes me think of is, I think what we can do on this show, on this series, is help bring some clarity, as well as kind of tease things apart. I kind of go back to the goal, constraint-based management, you know, what’s the constraint? Where do we need to focus on to help move this forward? What’s the signal? _____ I’m thinking Gary Gruver, he’s always talking about the signal, right? Getting clarity about the things that we’re talking about. And so, I think with the brain power here, the experience, and I know the other guests that we’ll have on over time, I think this is something that we can really strive for is let’s, in addition to explore, let’s see if we can elevate a few areas maybe with some clarity. So it’s a good goal to shoot for.

Alan Shimel: Absolutely. Well, panel, thank you so much for kicking off our series on observability. You guys have been great. Thanks so much to Logz.io for sponsoring and enabling this. This is needed in our community. We need to explore this. We need to shine a light, here, on maybe some of the dark corners around observability and how it fits into the whole process. So thank you all. Thank you, out here, for watching this. As I said, this will be a reoccurring series, every episode delving into another aspect of observability, so, stay tuned for it. You could always catch it on TechStrong TV, techstrong.tv, or at digitalanarchist.com, as well as on probably devops.com and Container Journal, et cetera. For now, though, this is it.

This is Alan Shimel. I hope you’ve enjoyed our first episode of observability at scale. We’ll see you on the next one, but we’re out. Bye-bye.

[End of Audio]

Alan Shimel

As founder, CEO, and editor-in-chief at Techstrong Group, Alan manages a broad array of businesses and brands including Techstrong Media (DevOps.com, Security Boulevard, Cloud Native Now, Digital CxO, Techstrong.ai, Techstrong ITSM and Techstrong TV), Techstrong Research and Techstrong Learning. To do so and succeed, Alan has to be attuned to the world of technology, particularly DevOps, cybersecurity, cloud-native and digital transformation. With almost 30 years of entrepreneurial experience, Alan has been instrumental in the success of several organizations. Shimel is an often-cited personality in the security and technology community and is a sought-after speaker at conferences and events. In addition to his writing, his DevOps Chat podcast and Techstrong TV audio and video appearances are widely followed. Alan attributes his success to the combination of a strong business background and a deep knowledge of technology. His legal background, long experience in the field and New York street smarts combine to form a unique personality. Mr. Shimel is a graduate of St. John's University with a Bachelor of Arts in Government and Politics, and holds a JD degree from NY Law School.

Recent Posts

IBM Confirms: It’s Buying HashiCorp

Everyone knew HashiCorp was attempting to find a buyer. Few suspected it would be IBM.

15 hours ago

Embrace Adds Support for OpenTelemetry to Instrument Mobile Applications

Embrace revealed today it is adding support for open source OpenTelemetry agent software to its software development kits (SDKs) that…

23 hours ago

Paying Your Dues

TANSTAAFL, ya know?

1 day ago

AIOps Success Requires Synthetic Internet Telemetry Data

The data used to train AI models needs to reflect the production environments where applications are deployed.

3 days ago

Five Great DevOps Jobs Opportunities

Looking for a DevOps job? Look at these openings at NBC Universal, BAE, UBS, and other companies with three-letter abbreviations.

3 days ago

Tricentis Taps Generative AI to Automate Application Testing

Tricentis is adding AI assistants to make it simpler for DevOps teams to create tests.

5 days ago