Autoptic Unfurls AI Change Resilience Agents to Analyze Telemetry Data

Autoptic today made available a set of change resilience agents that apply artificial intelligence (AI) to telemetry data pulled from multiple DevOps tools to identify the root cause of an incident.

Company CTO Peco Karayanev said the Autoptic Change Resilience agents invoke a domain-specific language (DSL), dubbed Performance Query Language, that Autoptic developed to analyze data that has been normalized on an Open Telemetry format. That data is collected via plug-in connectors that Autoptic has developed for 10 DevOps tools and platforms, including CloudTrail, CloudWatch, Datadog, GitHub, Grafana LGTM, Jira, and OpenSearch. Rather than simply relying on data collected via, for example, a Model Context Protocol (MCP) server, those connectors make it possible to normalize and validate the telemetry data collected, noted Karayanev.

The overall goal is to, via a natural language interface, make it possible for the Autoptic change resilience agent to detect and diagnose production problems before they become major incidents, said Karayanev. The AI agents developed by Autoptic make it possible to correlate massive amounts of telemetry data in a way that serves to increase the overall confidence that DevOps teams can have any time they are about to make a change to the IT environment, he added.

Designed to be deployed on multiple cloud computing platforms, the Autoptic AI agents are also designed to run cost-efficiently on those platforms to ensure a return on investment for DevOps teams, he added. Currently in use at six organizations, pricing for the Autoptic AI agents is also based on a flat predictable fee rather than usage.

While most incidents can be traced back to a recent update to an IT environment, many times the root cause of the issue is a series of changes that have been previously made over an extended period of time. Analyzing telemetry data to identify the real root cause of an issue requires being able to analyze massive amounts of telemetry data that today resides in multiple DevOps tools and platforms.

The amount of telemetry data that is being generated in modern IT environments is already overwhelming, an issue that is being further exacerbated with the deployment of AI applications that generate even more telemetry data than previous generations of software. The only way to effectively analyze that telemetry data is to rely more on AI tools and platforms that have been specifically designed for that purpose.

Each DevOps team will need to determine the degree to rely on AI agents to perform tasks such as analyzing telemetry data, but at this point, it’s not so much a question of whether they will be deployed so much as it is how many of them there will be. More than likely, each DevOps engineer will have multiple AI agents that will be deployed alongside AI agents that are designed to perform specific tasks on behalf of the entire team. The issue, then, of course, becomes determining how best to orchestrate all the tasks assigned to those AI agents alongside the remaining tasks being performed by the engineers managing the overall DevOps workflow.