xMatters Incorporates Stackdriver APM Metrics from Google

As a set of application monitoring services on the Google Cloud Platform (GCP) for driving DevOps processes, Stackdriver has been steadily gaining adherents, especially since Google extended the reach of that service into the realm of on-premises systems. Today xMatters, a provider of incident management tools, announced it is integrating alerts and data generated by Stackdriver into its workflow processes.

Company CTO Abbas Haider Ali said the xMatters incident management software will now automatically relay Stackdriver data to the correct people and systems to help coordinate and resolve incidents faster. Stackdriver insights will be embedded into notifications, collaboration invitations and service management tickets, said Haider Ali.

The goal, he said, is to provide DevOps teams with more context concerning specific issues directly within the user interface employed to access the incident management platform.

Google, via Stackdriver, not only provides access to a tool for application performance management (APM), it also effectively embeds best practices for site reliability engineers (SREs) inside the service. While the best practices defined by Google are highly opinionated, they are starting to gain traction as a specific methodology from implementing a DevOps process, said Haider Ali.

Support for Stackdriver follows the recent release of an update to xMatters that added an Event Flood Control capability to suppress similar requests in close succession as part of an effort to reduce the level of noise generated by a single incident. The company also added an Unlimited Event Visibility feature allows organizations to unlock insights from past incidents to identify a potential root cause of a problem faster.

Of course, while Google may have invented the SRE concept, it is clear other cloud service providers are coming up with their own flavors and many internal IT operations teams will likely create their own equivalents. But as the SRE concept continues to gain traction, the rate at which DevOps processes are adopted enterprisewide should increase. The challenge most enterprise IT organizations face today is their internal teams are largely made up of administrators rather than engineer that know how to programmatically manage infrastructure as code. Many of those organizations are either hiring former developers to become SREs, or investing in training to up level the skills of their existing IT administrators.

But as more organizations embrace SRE concepts many of them will be looking to create closed-loop approaches to DevOps that enables them to significantly reduce the amount of time between when an issue or incident is discovered, and it gets assigned within a continuous integration/continuous deployment (CI/CD) environment to be programmatically addressed.

In the meantime, any gaps that may have existed between DevOps processes and incident management systems are continuing to narrow. Not every organization necessarily has to hire SREs to achieve that goal. But those organizations that can afford to hire and retain SREs should expect they will insist on incorporating incident management with the context of a larger set of sophisticated DevOps processes.

— Mike Vizard