Google Taps Nobl9 to Improve Services Reliability

Google announced this week it has added a capability for tracking reliability based on how service level objectives (SLOs) are being met that is based on a platform developed by Nobl9.

Nobl9 launched a Nobl9 Reliability Center earlier this year that provides a single source of truth about the state of reliability in an application environment. Included in that platform is SLOgpt.ai, a generative artificial intelligence (AI) tool for programmatically creating SLOs and launching natural language queries to determine how well they are being adhered to across a distributed computing environment. The generative AI tool developed by Nobl9 is based on Vertex, a foundational large language model (LLM) provided by Google.

Kit Merker, chief growth officer for Nobl9, said the generative AI tool also makes managing SLOs more accessible to a wider range of stakeholders.

The expansion of the alliance with Google is the latest in a series that will make it simpler to manage SLOs across multiple ecosystems, including, most recently, partnerships that have been established with Microsoft and Cisco, noted Merker.

The Nobl9 Reliability Center is already integrated with more than 50 DevOps, observability and incident management tools. It collects metrics, events, logs, traces and alerts to track incidents, releases, rollbacks, runbooks and other documentation. Those integrations will make it simpler for IT teams to create SLOs using historical data that would otherwise be challenging to define from scratch.

In time, there is no doubt tools for managing SLOs will become more widely available, but given the dependencies that exist between applications and systems, DevOps teams will need an approach that can be employed across hybrid IT environments, said Merker.

In general, as more microservices-based distributed applications are built and deployed, it’s becoming more challenging to maintain SLOs across them since they have many more dependencies than legacy monolithic applications. Each feature and application programming interface (API) added over time can adversely impact performance and reliability. The challenge is that it’s not so much that any given service will outright fail as much as it is determining where degradations are occurring that result in SLOs not being met.

Each IT team needs to provide some sort of objective benchmark that assesses their overall effectiveness at delivering application services by making it simpler to track whether SLOs are achieved and maintained. Not every service needs to necessarily meet a stringent requirement, but IT teams do need to have confidence in the fact that, on average, certain levels of service are being consistently maintained.

When it comes to SLOs, the challenge is first determining which dependencies might impact the level of reliability that can be maintained in the event of an outage and then, second, making sure whenever necessary workarounds—such as rerouting application programming interface (API) calls to other services—can be easily invoked.

SLOs, of course, as a concept, have been around for decades, but with the rise of digital services that span multiple organizations, the level of reliability DevOps teams are expected to provide is now being written into business contracts. As a result, the overall level of stress related to managing SLOs is only going to increase in an era where businesses have never been more dependent on software.