Blameless this week announced it has integrated its namesake incident management solution with the Opsgenie alerting platform from Atlassian.
Aaron Lober, director of product marketing for Blameless, said the Blameless platform is now integrated with both Opsgenie and PagerDuty, the two most widely used platforms for managing alerts.
Blameless provides an incident management platform that normalizes data collected from a wide range of DevOps tools and platforms to determine how best to respond to different incident classes. Armed with those insights, it then becomes possible to quickly assemble the best team to respond to an event, said Lober.
That capability reduces stress because Blameless also surfaces recommendations for resolving incidents based on the data it has collected, he added.
The Blameless platform is tailored to organizations that have hired site reliability engineers (SREs) to manage IT operations using DevOps best practices, noted Lober. The overarching goal is to eliminate rote processes that, over time, conspire to burn out incident management teams that often experience high turnover rates, he added.
Opsgenie has been connected to the Blameless platform via the service catalog and alerting functions. Blameless users responding to or managing an incident can escalate issues to any teams defined within Opsgenie. The service catalog and alerting data generated throughout the life cycle of an incident is automatically captured and stored on the Blameless Incident Timeline. That capability makes it possible to identify appropriate service owners and the escalation protocol that was followed by that team. Blameless can also display the actual user name of each responder that has been notified.
When accessing Blameless via Slack or Microsoft Teams, Opsgenie can automate both when an incident channel is created and pre-select relevant services that should be made available. Alerts can also be manually triggered using commands from the Blameless chatbot for those who prefer to trigger alerts from within Slack or Teams themselves.
As IT environments become increasingly more complex, more organizations are making a concerted effort to apply DevOps best practices to incident management. The goal is to reduce downtime and increase overall resiliency at a time when organizations are more dependent on software than ever. As part of that effort, IT teams should identify the repetitive processes that span multiple incidents and then move to automate as many as possible.
In the longer term, it’s not clear what impact artificial intelligence (AI) will have on incident management. At some point, though, IT teams should expect their efforts to be augmented by machine learning algorithms capable of identifying issues before they balloon into a major crisis. However, there will always be some human intervention required to manage incidents for the foreseeable future, noted Lober.
In the meantime, most IT teams should continue to expect the unexpected at a time when IT environments have never been more dynamic and change multiple times a day.