This week, Splunk, the leader in machine data intelligence, announced the acquisition of IT alerting solution VictorOps for $120 million. My congratulations to the VictorOps team for building a solid product for alert management and collaboration, demonstrating a wicked sense of humor and creating some cool T-shirts.
While the acquisition makes sense on many levels, it does raise broader questions about the need for standalone IT alerting tools. Let me explain why IT alert management as a category faces strong headwinds and how these tools will need to evolve if they are going to remain relevant in a world driven by digital transformation.
What Do IT Alert Management Tools Actually Do?
When an IT outage or service disruption occurs in an enterprise, the first order of business is to restore operations back to normal. IT outages can cost you a lot (an hour of downtime can cost you upwards of $100,000 per hour), disrupt your customer experiences and derail employee productivity. To quickly troubleshoot an IT incident, you’ll need to notify the right subject matter experts who can collaborate together to pinpoint root cause for an outage.
Every day, enterprise network operations center (NOC) teams receive hundreds of alerts from different IT management tools for monitoring, service desk, event correlation and configuration management. A certain percentage of these alerts demand urgent attention from NOC teams who are on-call and readily available for incident support.
Enter IT alert management tools, whose job it is to manage the notification of messages to personnel based on predefined rules regarding who to notify for a given categorization of event or incident. Tools from PagerDuty, VictorOps, xMatters, OpsGenie and Everbridge alert the right team members using on-call schedules and notify them using a variety of channels including email, text, voicemail and chat. IT alerting tools help you manage the entire incident life cycle, using on-call schedules, alert routing rules and collaborative war rooms.
Embedded Alert Context: The Missing Piece of the Alert Puzzle
The fundamental challenge with scaling enterprise alert management is about establishing the right context for a flood of alerts that you’ve just received. Instead of forwarding every single alert that you receive, can you share actionable alerts that your NOC teams can immediately act upon?
Your IT alerting tool needs the ability to correlate a bunch of alerts into a single event for faster acknowledgment and troubleshooting. Alert consolidation and correlation call for intelligence that can extract signal from the noise and compress alert volumes with the right inferences. Artificial intelligence for IT operations (AIOps) helps you do just that, by applying big data and machine learning techniques to event management data. With AIOps tools, you can handle events and minimize alert noise by delivering timely and actionable insights for incident support. This agility and insight, coupled with intelligent correlation to stay on top of major outages, helps NOC teams by focusing their attention on the incidents that truly matter.
AIOps + Alert Management = Smarter Incident Management
Without the ability to apply machine learning algorithms to alert floods, alert management tools act as “dumb forwarders” of IT alerts, inducing alert fatigue and adding to the overall noise. It’s time for alert management to get smarter and contextual with machine learning insights.
Splunk’s acquisition of VictorOps is a clear sign that alert management tools as a standalone category can’t truly scale without a solid AIOps foundation. With artificial intelligence, IT teams can stop wasting valuable time figuring out incident triage and root cause analysis and ensure faster recovery from crippling outages with the right event analytics.