Every once in a while, a trend becomes broad enough that I feel the need to offer career advice. This year it will be simple: Know enough about AI/ML to be dangerous. We are past the point where not having at least a basic understanding is a good idea. Looking around the data center, AI is in use from the earliest stages of development through production network performance evaluation. Somewhere in your environment, you are using AI and ML. More to the point, somewhere in the places you may interview going forward, they are using AI and ML.
I’m not suggesting you become an AI/ML expert—though you certainly could; I have a relative that is a data scientist and that’s all he does—design and tweak algorithms. He’s learning new things pretty much daily, so if you like that (as I do), it’s an option. But for those of us doing DevOps or SRE or even regular old operations, we’re only going to have to understand a bit.
Understand what tools like Jupyter are, and what the most commonly used ML algorithms in your space are. Then, learn about how AI and ML help with what you are doing. Just as knowing more about the compiler/interpreter of your chosen language helps you solve more odd problems, AI is going to have that same issue. Think about how those folks who understand the guts of routing are the ones we want solving our routing problems. Again, same scenario.
Moving forward, AIOps, NetOps, DevSecOps … All intend to increase the amount of work offloaded onto the AI in question. In a lot of cases that is great. We have enough to do; if you can paw through those logs and give me relevant summary info, go for it. But it doesn’t end there—some AI engines are making the calls and implementing changes. In both network and application security, a fair number of vendors will tell you there is no time to wait for humans to stop an attack, and they encourage their customers to turn on automated responses. That’s the first of many applications for AI/ML. We will see an increasing number of AI/MLOps tools being used for full-on automation. We all do difficult jobs with a complex—sometimes bewildering—number of inputs. But we use pattern recognition and experience to make determinations and do what’s best for our organization. Our software has reached the point where sometimes, it can do that for us on the easier tasks so we can focus on the more complex ones. When the AI/ML fails—and it will, because pattern recognition and historical trends are not always indicative of normal usage—knowing what is going on under the covers can help you determine if it did the right thing in the wrong way, or simply did the wrong thing. And it can help you to quickly recover. I can’t speak for every organization, but I’ve never seen one that lamented, “Sure wish that outage had lasted another couple of hours!”
And keep rocking it. Somewhere in the distant future, an AI may be able to do your job as well as you. But that isn’t today. Even if it can handle one bit, it certainly isn’t rocking the entire breadth of your responsibilities like you are. Thank you, from all the users that don’t realize you’re keeping them live.