The New AI Overseer Role in DevOps

Robot overlords are the stuff of science fiction fantasy. Today, with more teams implementing artificial intelligence (AI), DevOps teams need a hard, continuous look at reality. An AI overseer watching apps at work could make the difference between sadness and euphoria in your efforts.

We’re seeing AI misses more and more often. Facebook recently admitted its algorithms failed to stop uploads of a particularly heinous live-streamed video. Autonomous vehicles have struck cars and people. Jaywalkers in China have been misidentified and publicly shamed in error. It’s not for lack of effort in development. The adage “If it were easy, anyone could do it” applies.

A few weeks back, I was in a Twitter chat session on AI led by Kirk Borne, data scientist at Booz Allen Hamilton. The question came up of what organizations should consider when creating guidelines for use of data and bias-free AI. Here’s my response:

A4: DevOps teams should have someone whose role is AI oversight—actually watching what it’s doing and constantly testing/validating its findings #MITSMRChat
— Don Dingee (@StratisetDon) February 14, 2019

Common Problems

Most fails are due to fire-and-forget, wherein AI systems are trained, tested against training and turned loose. Even a well-designed AI system is subject to several problems:

Errors: This is where AI makes a factually incorrect decision. False positives and false negatives are in this category, along with miscategorization problems.

Gaps: There are situations in which AI understands conflicting options and is unable to choose. Then, there are situations in which the AI doesn’t recognize what it sees.

Bias: Maybe the system prefers green M&Ms or German shepherds because there was some subtle trend in the training data for “candy” or “dog.”

Ethics: This can be a higher level of bias, where preference is given based on age, gender or race. Sometimes it’s the classic no-win scenario in which both options are harmful.

An AI overseer guards against these types of problems in near real-time, every day, for as long as it takes to weed them out. Some problems are easier than others. An example of a more difficult problem came when autonomous vehicle researchers realized that not all traffic signs are brand new and well-lit. Shade, branches, wind, scratches, dings, bullet holes, stickers, graffiti and many more types of defects exist in the real world.

Increased Complexity

Inevitably, part of the solution the AI overseer leads is better scenario-planning and more training data. Retraining becomes a way of life and at some point, bigger data sets also lead to more complex implementations and another challenge: speed—after retraining to correct issues, the reloaded real-time classification engine runs like it’s stuck in molasses, producing correct results too late to be useful.

While the A system is running, a B version should be in development with new training and classifier boxes. For very large data sets, teams should consider designing a workload-optimized training server complex instead of relying on elastic cloud resources. This was behind NVIDIA’s recent acquisition of Mellanox—using InfiniBand for more bandwidth between GPU clusters in AI server complexes.

Where training systems often scale, real-time systems usually don’t. They are optimized for size and power consumption and can only accept so many additions. When a bottleneck emerges, it can kill the entire application. New generations of CPUs, GPUs and FPGAs keep pushing the envelope for designers tackling AI challenges.

Human-in-the-Loop

In many use cases, the job of the AI system is not to make every single decision autonomously. Instead, the system should make routine decisions and lead the AI overseer to look at exception cases. Directing humans quickly to areas needing attention is often more valuable than risking bad decisions. Human-in-the-loop is something DevOps teams should architect into systems more often.

This may all sound obvious, but it points out a big difference in developing AI versus developing code-based algorithms. If there’s a bug in code, you can debug it and find the offending lines of code to fix. But debugging AI is a lot more difficult. Digital twin implementations come into play using playback of real-work inputs to look at what happens inside the classifier black box. There are usually hints of problems developing before catastrophe strikes. Running data both good and bad can help spot issues.

Ideally, the AI overseer is a hybrid of a data scientist plus a systems architect. Look for someone who understands both data sets and hardware implementations, at least at a high level. They may have to call in other resources on the team for detailed investigations.

The last thing you want after all the hard work in developing a system is to have it run amok months or years later. It’s just Murphy’s Law. AI systems will do something entirely unexpected at the worst possible time. An AI overseer can help a DevOps team get ahead of AI problems.