Latest News Releases

Zebrium Launches Root Cause as a Service Enabling Popular Observability Tools to Automatically Find the Root Cause of Software Problems and Outages

RCaaS Slashes Mean-Time-to-Resolve by 90 percent and has a validated accuracy rate of over 95 percent

Santa Clara, CA – June 15, 2022 – Zebrium, the leader in the use of machine learning to automatically find the root cause of software problems, today announced that it has launched Zebrium Root Cause as a Service (RCaaS), a new solution that adds the capability for monitoring and observability tools such as Datadog, New Relic, Elastic, Dynatrace, Grafana, AppDynamics, ScienceLogic, and others to automatically find the root cause of software and infrastructure incidents. 

When an incident with production software occurs, Zebrium RCaaS automatically finds the root cause and presents a summary of the problem directly on existing monitoring dashboards, alongside other charts showing metrics, traces and APM data. This allows Site Reliability Engineers (SREs), DevOps personnel and developers to reduce the Mean-Time-to-Resolve (MTTR) software or infrastructure problems by 90 percent. 

Today, when technical teams encounter a new service outage or problem, they typically rely on observability tools to facilitate the troubleshooting process. Without Zebrium, this involves looking at metrics to determine “when” the problem started, drilling-down on traces or APM data to narrow down the source of the problem (the “where”), and finally combing through large volumes of logs from the application and infrastructure stack to determine the root cause (the “why”). This process can take many hours and requires extensive team resources while critical services remain impacted. Now with Zebrium RCaaS, the painstaking process of digging through logs is automated. The end-result is that RCaaS quickly uncovers the root cause indicators that technical teams would have eventually found by manually combing through logs. 

RCaaS has a validated accuracy rate of finding the correct root cause in over 95% of incidents. “The Cisco Technical Assistance Center (TAC) spends thousands of hours each month analyzing software logs to find the root cause of customer incidents,” said Koree Mires, Director, Global TAC Innovation, Automation and Disruption at Cisco Systems.  “We had been investigating ways to help automate this process for many years. When we came across Zebrium, we were immediately impressed. In order to validate its effectiveness, we tested RCaaS with four product lines and 192 actual customer incidents. We were astonished to find that RCaaS correctly found the root cause automatically over 95 percent of the time. We are now leveraging the technology to speed-up customer incident resolution and will continue rolling it out to more product lines throughout the year.”

Zebrium RCaaS is designed to make the details of root cause available in the same tools and workflows that SREs, Devops engineers and developers are already using. RCaaS has complete “out-of-the-box” integrations with popular observability and monitoring tools, including Datadog, New Relic, Elastic, Dynatrace, AppDynamics, Grafana, ScienceLogic and others. It also natively integrates with incident management and response platforms including PagerDuty, Opsgenie, Victorops, Slack, Teams and email systems. Additional 3rd party tools can also easily be integrated through a set of open-APIs.

“The cost of downtime keeps rising, and throwing engineers at the problem is not a scalable solution,” said Ajay Singh, CEO, Zebrium. “Since speed and accuracy are essential when software teams need to resolve application incidents, the only way forward is an automated approach to Root Cause Analysis (RCA). Zebrium RCaaS is a proven way to do this. Since our platform does not require any manual training or rules, customers can get started in just a few minutes, and leverage RCaaS almost any kind of observability tool already in place.”

Zebrium RCaaS is now available for purchase directly from Zebrium or in the Datadog and AWS marketplaces. For more information or to sign-up for a free trial, visit https://www.zebrium.com

Supporting quotes from Zebrium partners:

Alex Vetras, Senior Product Manager at Datadog.

“We are thrilled to have Zebrium as a Marketplace partner. Zebrium’s native Datadog app helps organizations identify the root cause behind outages by analyzing ingested logs to detect anomalies. Together with Datadog’s Watchdog capabilities, DevOps and SRE teams can now be notified of what went wrong and why, automatically.”

Peter Pezaris, SVP, Strategy and User Experience at New Relic:

“Application issues are frequently discovered in production and require all hands on deck to triage and resolve the problem. When an outage occurs, engineers also need to be able to get to the root cause of why, when, and where the service failed in order to prevent future incidents. With the Zebrium quickstart for New Relic, we’re providing our customers with a valuable tool that helps them avoid manually searching for root cause indicators in logs, and surfaces the root cause right in their New Relic dashboard. This helps engineers reduce their MTTR and find the root cause of software incidents faster.”

Erik Rudin, Vice President of Technical Alliances and Ecosystems at ScienceLogic:

“The Zebrium integration for ScienceLogic complements our AIOps and monitoring platform by automatically finding the root cause of incidents that would otherwise require manual log analysis.”

Tags: Zebrium

Recent Posts

Datadog DevSecOps Report Shines Spotlight on Java Security Issues

Datadog today published a State of DevSecOps report that finds 90% of Java services running in a production environment are…

3 hours ago

OpenSSF warns of Open Source Social Engineering Threats

The XZ attack wasn't the first, nor will it be the last. Linux dodged a bullet the other day. If…

7 hours ago

Auto Reply

We're going to send email messages that say, "Hope this finds you in a well" and see if anybody notices.

12 hours ago

From CEO Alan Shimel: Futurum Group Acquires Techstrong Group

I am happy and proud to announce with Daniel Newman, CEO of Futurum Group, an agreement under which Futurum has…

12 hours ago

CDF Survey Surfaces DevOps Progress and Challenges

Most developers are using some form of DevOps practices, reports the CDF survey. Adopting STANDARD DevOps practices? Not so much.

1 day ago

Survey Surfaces Widespread Reliance on Generative AI Among Developers

Two thirds of developers are using AI in product development, primarily for coding, documentation, and conducting research.

1 day ago