Four Steps to Implement an Observability Strategy for Microservices

You’ve finally decided. You’re making the move to microservices and re-architecting your entire infrastructure. Some services will still be on-premises (those databases and their legacy apps are never moving), but a cloud-based microservices architecture will allow your teams to build and bring new products to market faster.

But how do you handle the complexity that comes with cloud and microservices? You know you need a strategy for monitoring and managing the complex dependencies and ephemeral nature of your new architecture, but you don’t know where to start. What’s more, you don’t want to spend time and money now only to realize a few months down the line that you’re on the wrong road. Your investment needs to be future-proof.

You need to implement a strategy for observability. Observability goes beyond monitoring to give you greater operational insight, leading to shorter incidents, fewer quality problems, a better product and happier customers. Your company has unique observability needs, which means how you approach your observability strategy will also be unique. As is usually the case, it’s important to start with a roadmap and goals. Based on my experience, the following four things are key to making a considered, comprehensive observability strategy that’s built for the long haul.

Choose an Observability Platform that Fits Your Future

Perhaps counter-intuitively, the right platform isn’t necessarily the one that’s the right fit for you today. If you’re investing in new technologies such as microservices, containers and Kubernetes, you’re going to see a significant increase in the scale and ephemerality of your environment. A monitoring platform that may suit your current infrastructure likely won’t be able to keep up and meet your future needs. Instead, consider how your infrastructure will look three, four or even five years down the road, and then select the platform that best fits that long-term strategy. It doesn’t matter if you choose a commercial tool or an open source one: What matters is that you give considered thought to your future infrastructure and choose a platform that can address the future needs of your environment—and protects your investment at the same time.

Monitor the Metrics that Really Matter

If you’re still in the early days of your shift from monitoring to observability, you’re likely taking a service-based approach, monitoring specific pieces of your ecosystem with a focus on individual incidents. The real value of observability is the ability to provide visibility into big-picture issues, such as measuring user experience, performance against SLOs and other key metrics important to the business. In modern environments, small upticks in a microservice can cascade into increased latency for an entire product, a specific feature, or even a specific customer. Having an observability platform that measures and collects data across multiple dimensions–e.g. customer/business-specific, application/service specific and infra/host/container specific–will allow you to understand the telemetry in many different ways. The platform should also aggregate the interactions between your services and unlock a wealth of data for engineers to investigate when issues do arise, and provides visibility into the metrics the business cares about most

Embed Observability into Your Incident Management Process

Incident management is a hot topic right now. I speak with CTOs across organizations and industries who have fantastic tools that provide detailed telemetry, but everything breaks down when an incident occurs. They need to better embed observability as a part of their incident management process. For example, ask your incident management team how observability helped in a given incident and record those results. This keeps track of the ongoing benefit. Next, ask how observability could be improved to get the team closer to the crucial clues they need during an incident. And finally, make sure to fold these items into your remediation plans so the team has a clear playbook when incidents do occur.

Establish a Culture of Observability Across Your Organization

Simply deploying a new technology of any type is not enough. You need to ensure you’ve built the appropriate processes and norms around the new platform to ensure its success—this is especially true with observability. Modern observability platforms can be incredibly powerful in providing insight into the performance of your ecosystem, but without an investment in your organization’s ability to use the platform, you’ll barely scratch the surface of its true value.

Observability requires ongoing investment and improvement. Start by building a core group of champions, then fold training into onboarding, continuing education or brown-bag sessions. Using subject matter experts to create the content is a good way to get everyone off on the right foot. Use the output from observability tools in company meetings, team syncs or support settings to keep everyone on the same page. This sort of standardization raises the collective skill of the entire organization and demonstrates the value of observability to those who are not already on board.

Following these four steps will set you off in the right direction on your observability journey. If your organization is like most, you keep an eye on what you think could go wrong. While that’s unquestionably important, a mature observability strategy will give you an insight into previous unknowns and help you more quickly understand why incidents occur. And as you continue on your observability journey and your understanding of what and why things break improves, you’ll be able to implement increasingly automated and effective performance improvements that impact your company’s bottom line.

To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon NA, November 18-21 in San Diego.

— Arijit Mukherji