Chronosphere Adds AI Remediation Guidance to Observability Platform

Chronosphere this week previewed artificial intelligence (AI) capabilities that are embedded into its observability platform that, in addition to helping identify the root cause of an issue, also provides remediation suggestions.

Additionally, the company made available a Model Context Protocol (MCP) Server through which AI coding tools and agents will be able to query observability data.

Chronosphere CEO Martin Mao said the AI-guided troubleshooting capabilities scheduled to be made generally available in 2026 will make it easier for DevOps teams to use natural language to triage issues by evaluating remediation suggestions surfaced by a large language model (LLM).

Those suggestions are made more reliable by employing a Temporal Knowledge Graph embedded into the Chronosphere platform to ensure that only relevant telemetry data has been exposed to the LLM, said Mao. That knowledge graph ensures that the LLM doesn’t process any extraneous data, which enables it to then generate more deterministic rather than probabilistic suggestions, he added.

DevOps teams will still need to determine if those suggestions are appropriate, but mean time to remediation of incidents should improve significantly because many of the initial investigations of an incident are now automated, noted Mao.

Additionally, transparency into how those suggestions were arrived at is also provided for DevOps teams to evaluate via a set of notebook workspaces and, if approved, are fed back into the Temporal Knowledge Graph to make the set of suggestions provided even more accurate, said Mao.

The overall goal is to build trust over time in the output being generated via the Temporal Knowledge Graph, added Mao. The key to building that is to provide a level of explainability that shows that AI models are not just randomly generated recommendations, he added.

Each DevOps team will need to determine for itself how much to rely on AI but as the Chronosphere platform becomes more familiar with an IT environment the level of trust those teams can have in the output generated will continue to grow, noted Mao. In some cases, DevOps teams might want to enable the output generated by the AI tools to automatically initiate a runbook to resolve, for example, a lower level issue that is not likely to disrupt an application environment, he added.

Historically, the challenge with any observability platform has been the expertise required to effectively use them. Many IT organizations continue to rely primarily on monitoring tools to track a set of pre-defined metrics simply because they lack the knowledge required to launch queries to identify potential issues or the root cause of an incident. Natural language AI tools, however, should make observability platforms more accessible.

In fact, as AI continues to evolve, observability will become more democratized in a way that reduces the level of expertise needed to derive value. As that evolution occurs, the number of incidents that might lead to an actual outage should, hopefully, dramatically decline even as the underlying IT environment itself continues to become more complex to understand and manage.