Machine Intelligence-Driven Anomaly Detection
Machine learning—the science of getting computers to take thoughtful action without explicit programming—is the basis of Machine Intelligence-Driven Anomaly Detection, which is the core capability in Avik Partner’s Grok APM product, according to Casey Kindiger, veteran enterprise software solutions provider, IT automation expert, and founder of Avik Partners.
“Thoughtful action means using the type of faculties that a human being would use to make a decision and to take an action,” says Kindiger, who has more than a decade of experience in designing and developing process automation solutions for companies such as JP Morgan Chase, T-Mobile, and Allstate Insurance. Grok learns from your dev and ops environments and takes thoughtful action to detect anomalies, all in the service of Application Performance Management.
Grok works by analyzing real-time performance data from apps and infrastructure, looking for behavioral bumps in an otherwise smoothly-paved DevOps highway of deployed services and underlying infrastructure.
Grok’s Pre-Event Inferences
“Grok uses common statistical techniques to learn a pattern over a selected period of time in order to build a dynamic threshold of the continuously changing underlying patterns,” says Kindiger. In this way, Grok can establish a sort of dynamic baseline from very noisy patterns, which is something that would typically take a number of person hours of involved statistical and predictive analytics to accomplish.
Grok’s machine intelligence leverages the ability to make inferences based on this dynamic baseline information like a human being would in order to reason out a logical conclusion. Grok does this to detect the slightest of anomalies based on the difference between what reasonably should have happened and what actually did happen. Using this approach, Grok detects subtle patterns of significance in streaming data despite the fact that the data has patterns that are continuously in flux.
But rather than detect when something has gone wrong at a point in time, which is not scalable or sustainable, Grok looks for minute patterns that appear well in advance of any issue that you would normally consider looking at with traditional APM mechanisms, explains Kindiger. Rather than send alerts based on threshold-based exceptions, Grok makes a prediction on what the next data point or the next series of data points will be and then looks at deviations from those predictions, considering whether to flag those as anomalies, according to Kindiger.
Currently, Grok is particularly tuned to analyze streaming data and is, according to Kindiger, data-point agnostic. “We don’t need to teach the system what the context of a particular data point is. The system learns the context of the data over time and the subtle nuances of the pattern that produces over time, and creates the model based on that,” says Kindiger.
Watch Out for That Initial Two Week Delay
If Grok does everything that Avik Partners’ Kindiger hypes that it will, DevOps teams will soon be able to catch and act on infrastructure issues before they grow big enough for end-users to notice.
But Grok does have one wrinkle in its APM armor: it takes time for it to really learn a new customer system and get good at detection. Let’s assume that Grok’s machine learning causes it to get better at detecting anomalies in a given set of infrastructure over time. That means that there’s a point, at the beginning, given a brand new customer when Grok only starts to learn. At that point, Grok isn’t very good at knowing what’s anomalous compared to what’s normal for that infrastructure because Grok has no foundation for that particular system.
“Grok will take some time to learn the nuances of when to escalate an issue. We’ve learned from experimentation that it gets pretty good after ingesting two weeks’ worth of data,” says Kindiger. Because Avik Partners initially targeted AWS, it has taken two weeks of CloudWatch data and fed that to Grok to provide it with some backdrop so that when Grok goes live, it won’t be learning this environment entirely anew.