When it comes to incident response with an integrated DevOps environment so much depends on the ability of an IT team to respond quickly and consistently to a specific set of events. Because of that requirement, many IT organizations have invested in incident management systems that everyone in IT can consult whenever an issue arises. Now PagerDuty, a provider of incident management software, is taking that concept a step further by applying artificial intelligence (AI) to automate as many incident response tasks as an organization deems appropriate.
As promising as that may sound, however, Resilient Systems, a unit of IBM that is an arch-rival to PagerDuty, contends most organizations today do not have anywhere near the level of incident management maturity to make investing in AI worth their time and effort yet.
Eric Sigler, head of DevOps for PagerDuty, says the the latest edition of the company’s namesake incident management platform is designed to augment the capabilities of the local IT administrator by providing:
Alert Grouping: Rules-based automation and machine learning algorithms automatically group related issues together to make it simpler to identify related issues.
Similar Incidents: IT administrators and support specialist can see previous related issues and surface information around incident severity, impact and remediation steps.
Response Automation: A Response Plays capability allows teams to design and execute an automated response pattern to recruiting responders and stakeholders by a single click.
Dynamic Notifications and Event Routing: IT teams can select notification and assignment behavior and automatically route events to different teams based on event payloads.
Sigler says PagerDuty has also redesigned the company’s user interface, creating an enhanced live incident details page that makes discovering events simpler.
Those capabilities, says Sigler, will collectively prove to be critical in an emergency because most of the time the specialist associated with a specific IT discipline is usually not immediately available when needed. First responders to any given event need as much information as possible as their fingertips to resolve issues as quickly as possible.
But Ted Julien, vice president of product management for Resilient Systems, says most organizations simply don’t have incident response processes in place that would benefit from that level of AI. Machine learning algorithms require access to massive amounts of incident response data to learn what to do, but most organizations don’t collect that data, much less learn from it, he says. That doesn’t mean AI doesn’t have potential. But he believes AI is not nearly as far along when it comes to incident response as anyone would like. In fact, he notes, machine learning algorithms and other forms of IT automation could just as easily wind up doing more harm than good.
There may come a day when incident response in an IT environment is fully automated. But the distance between where most IT organizations are today and the achievement of that goal is still nothing less than considerable.