RCaaS Slashes Mean-Time-to-Resolve by 90 percent and has a validated accuracy rate of over 95 percent
Santa Clara, CA – June 15, 2022 – Zebrium, the leader in the use of machine learning to automatically find the root cause of software problems, today announced that it has launched Zebrium Root Cause as a Service (RCaaS), a new solution that adds the capability for monitoring and observability tools such as Datadog, New Relic, Elastic, Dynatrace, Grafana, AppDynamics, ScienceLogic, and others to automatically find the root cause of software and infrastructure incidents.
When an incident with production software occurs, Zebrium RCaaS automatically finds the root cause and presents a summary of the problem directly on existing monitoring dashboards, alongside other charts showing metrics, traces and APM data. This allows Site Reliability Engineers (SREs), DevOps personnel and developers to reduce the Mean-Time-to-Resolve (MTTR) software or infrastructure problems by 90 percent.
Today, when technical teams encounter a new service outage or problem, they typically rely on observability tools to facilitate the troubleshooting process. Without Zebrium, this involves looking at metrics to determine “when” the problem started, drilling-down on traces or APM data to narrow down the source of the problem (the “where”), and finally combing through large volumes of logs from the application and infrastructure stack to determine the root cause (the “why”). This process can take many hours and requires extensive team resources while critical services remain impacted. Now with Zebrium RCaaS, the painstaking process of digging through logs is automated. The end-result is that RCaaS quickly uncovers the root cause indicators that technical teams would have eventually found by manually combing through logs.
RCaaS has a validated accuracy rate of finding the correct root cause in over 95% of incidents. “The Cisco Technical Assistance Center (TAC) spends thousands of hours each month analyzing software logs to find the root cause of customer incidents,” said Koree Mires, Director, Global TAC Innovation, Automation and Disruption at Cisco Systems. “We had been investigating ways to help automate this process for many years. When we came across Zebrium, we were immediately impressed. In order to validate its effectiveness, we tested RCaaS with four product lines and 192 actual customer incidents. We were astonished to find that RCaaS correctly found the root cause automatically over 95 percent of the time. We are now leveraging the technology to speed-up customer incident resolution and will continue rolling it out to more product lines throughout the year.”
Zebrium RCaaS is designed to make the details of root cause available in the same tools and workflows that SREs, Devops engineers and developers are already using. RCaaS has complete “out-of-the-box” integrations with popular observability and monitoring tools, including Datadog, New Relic, Elastic, Dynatrace, AppDynamics, Grafana, ScienceLogic and others. It also natively integrates with incident management and response platforms including PagerDuty, Opsgenie, Victorops, Slack, Teams and email systems. Additional 3rd party tools can also easily be integrated through a set of open-APIs.
“The cost of downtime keeps rising, and throwing engineers at the problem is not a scalable solution,” said Ajay Singh, CEO, Zebrium. “Since speed and accuracy are essential when software teams need to resolve application incidents, the only way forward is an automated approach to Root Cause Analysis (RCA). Zebrium RCaaS is a proven way to do this. Since our platform does not require any manual training or rules, customers can get started in just a few minutes, and leverage RCaaS almost any kind of observability tool already in place.”
Zebrium RCaaS is now available for purchase directly from Zebrium or in the Datadog and AWS marketplaces. For more information or to sign-up for a free trial, visit https://www.zebrium.com.
Supporting quotes from Zebrium partners:
Alex Vetras, Senior Product Manager at Datadog.
“We are thrilled to have Zebrium as a Marketplace partner. Zebrium’s native Datadog app helps organizations identify the root cause behind outages by analyzing ingested logs to detect anomalies. Together with Datadog’s Watchdog capabilities, DevOps and SRE teams can now be notified of what went wrong and why, automatically.”
Peter Pezaris, SVP, Strategy and User Experience at New Relic:
“Application issues are frequently discovered in production and require all hands on deck to triage and resolve the problem. When an outage occurs, engineers also need to be able to get to the root cause of why, when, and where the service failed in order to prevent future incidents. With the Zebrium quickstart for New Relic, we’re providing our customers with a valuable tool that helps them avoid manually searching for root cause indicators in logs, and surfaces the root cause right in their New Relic dashboard. This helps engineers reduce their MTTR and find the root cause of software incidents faster.”
Erik Rudin, Vice President of Technical Alliances and Ecosystems at ScienceLogic:
“The Zebrium integration for ScienceLogic complements our AIOps and monitoring platform by automatically finding the root cause of incidents that would otherwise require manual log analysis.”