It’s like clockwork. Every few months, a new buzzword descends on the tech industry, and soon it’s on everyone’s lips. Take your pick from the graveyard of buzzwords past: “Big data,” “internet of everything,” “hacktivism,” “gamification.” It’s not that these terms are meaningless—just that they go from relatively obscure to ubiquitous seemingly overnight. We can add a new one to the list in the digital monitoring space: “Observability.”
Unlike some trendy buzzwords, observability captures something that’s actually quite important about how we monitor digital systems and how current approaches differ from those of the past. So, it’s a good and useful concept. But where I take issue is with the way many analysts, technologists and vendors seem to use it. Whether or not they mean to, they often paint a far too narrow a picture of what observability is or ought to be, and that linguistic oversight can have real-world consequences. If enterprises invest in observability without understanding what, exactly, they’re observing—and what they’re not—they risk leaving themselves vulnerable to some serious blind spots.
What is Observability and Why Does it Matter?
Technically, observability is an approach with the goal of understanding how a system is doing. The distinction lies in what the system is and how it is being observed. The monitoring tools of yesterday typically focused on one or more domains to identify typical and known problems. Usually, that entailed collecting metrics directly from those domains and measuring them against predefined thresholds. For example, storage levels shouldn’t exceed a certain threshold; if they do, send me an alert so I can investigate what’s happening.
On the other hand, observability takes a more holistic approach, examining multiple outputs of the system to infer system health. For example, suppose you were trying to gauge a car’s condition. In that case, traditional monitoring might track engine temperature or idle speed RPMs, while observability would examine things like power output, engine leaks, transmission noise, emissions and fuel efficiency to infer the car’s overall health.
This output or outcome-focused approach to managing performance maps well to today’s complex and distributed digital systems. Even if you wanted to, you couldn’t pull real-time metrics from every relevant component of a system when those components consist of thousands of microservices running in any number of physical servers across multiple clouds. However, the bigger problem with more antiquated monitoring models is that they only capture the specific metrics they’re designed to track. If you’re encountering a problem you’ve never seen before, your monitoring dashboard might look perfectly healthy while actual system performance is terrible.
Observability and APM
So then, observability should give you everything you need to track the health of a modern digital business, right? Well, not so fast. Because for today, people seem to use “system observability” interchangeably with “application performance monitoring,” or APM. And they are not at all the same.
Yes, you want an observability-based approach to tracking your application environment, and modern APM tools provide a great way to do it. But knowing the health of your application environment is not the same as knowing the health of your overall digital systems as experienced by users. And at the end of the day, if you’re not gauging performance from your users’ perspective—whether that’s customers, employees or other devices relying on your systems—then what’s the point? It’s unfortunately not uncommon to forget that the most important key performance indicator (KPI) of any digital system is the quality of its intended output or outcome. In the case of any application, the most important KPI is the user experience.
The outputs that APM tools track do contribute to the user experience—and you want to be monitoring them. But so do many, many other things. There’s the user access itself (browsers, virtual or physical user devices loading the application). There are local, regional and global networks connecting those users to your application environment. And then, of course, there are the dozens, sometimes hundreds of third-party digital services running between and among users and applications. Problems in any link of that digital chain can seriously impact the user experience—even if your application environment itself looks healthy.
Observing End-to-End, from User to App and Everything in Between
To get the full picture of the health and performance of your digital assets, you should be tracking outputs from across the end-to-end digital service chain. That includes telemetry (logs, infra metrics, traces) from your application hosting environment. It should include actively and passively observing at the point of consumption—that is, outputs from customer browsers and worker devices. It must include active and passive monitoring of network connectivity. And it should include continuous active monitoring of all the disparate services—APIs, content delivery networks (CDNs), DNS services, cloud-based security services and every other cloud and web service that contributes to what the user is experiencing. (And by the way, you should look for independent, objective telemetry whenever possible rather than taking what your various service providers report at face value.)
When you approach observability from this outcome-based, holistic perspective, you can capture a real and truthful picture of the performance of your full digital business, not just pieces of it. Just as important, when you do identify a problem, you’re in a much better position to do something about it. Because now you have outputs to measure from all the many systems and services that contribute to the user’s application experience instead of just a subset of them, which means it’s much easier to zero in on which piece of the puzzle isn’t behaving the way it should.
A Smarter Approach to Observability
So, should you take all the hype about observability seriously? Yes. As tech buzzwords go, you could do a lot worse. This one actually means something important. But make sure you’re keeping the big picture in mind as you do.
Observability that ignores the outcome begins and ends with the application environment and is nowhere near sufficient. We need to expand our sense of observability, begin with the user experience and encompass every link in the chain of the end-to-end digital business. When we do, we create a much more holistic, useful and actionable picture of the health of our digital systems. And we can make sure we’re continually monitoring digital performance in the way that matters most: As experienced by users.