MLOps: More Than Automation

Recently, there has been a lot of talk about MLOps from the DevOps and software engineering perspective. The message often comes across as, “Hurry, DevOps community, let’s set some standards before MLOps reinvents the wheel,” or “The purpose of MLOps is to allow anyone who knows how to code to get ML into production.” Let me tell you, as a data scientist this disappoints and sometimes scares me. Why? Because neither of these points of view captures the pain of the data scientists or the people that are subject to ML algorithms.

A good example of this attitude toward MLOps is Google’s recent announcement. If you have read Google’s blog, you’ll see that some of the pain points they have identified are needs for data scientists: We do need continuous evaluation, continuous monitoring, metadata management and lineage and a feature store for collaboration. But where I find the announcement falls down for me is in the company’s goal “to make machine learning act more like computer science.”

Here’s the reality. DevOps wasn’t created to make software development behave more like a different field. It was created as a set of principles to enable software engineers to quickly and easily apply incremental improvements based on the needs of their customers, with a focus on quality. MLOps is more than automation. Centering on the needs of the users and the customers is what made developing products, ideas and a set of principles successful as DevOps.

For MLOps to learn from DevOps, we must center the needs of data scientists and the people that are impacted by their models first. It isn’t enough to say that practicing MLOps means advocating for automation and monitoring at all steps to do things faster. Without this focus, we will see an increase in the deployment of models that have uninspected and unintended consequences that often disproportionately impact marginalized communities.

So, as a data scientist, what is it that I need? Keeping up with the latest and greatest event streaming services, distributed systems or methods of continuous deployment of integration isn’t where my mind lights up. I would like to spend most of my time understanding the domain space of the model I’m about to build, the nuanced impact of that model and whether it’s going to meet the needs of my customers and the people they serve.

There are a few ways to notice if you’re applying MLOps basically as a Band-Aid, a way to just go faster, that will ultimately break down. When looking for a solution to automate, consider if you’re only reducing the work required for manual processes or if you’re also enabling data scientists to focus on the hard problems they’re trained to tackle. Another way to look at your MLOps practice and tools end-to-end is to see if they allow your data scientists to know the impact of their models across more dimensions than just the outcome. Let’s take a few examples.

Automation of Data Pipelines Can Slow You Down

Today, data engineers have been reluctant to allow data scientists to have access to all of the data that exists in all of its forms. There are many reasons I’ve been given in my years of experience, some of which are good, such as not all of the tools that data scientists have available for experimentation support all of these data sources. Other reasons, however, might include grumbling about creating additional work for data engineers. It’s a continuous challenge to get data such as event streams into a format that can be explored by data scientists. Each new feature I hope to create comes at a cost to other teams.

With our shiny new culture of MLOps, someone proposes a solution: Automate! But, remember, MLOps is more than just automation. This is a great place to stop and look at the data scientist’s workflow and find a way to automate a solution that fits their needs and gives them new capabilities while reducing the work on the data engineering team.

A simple automation instinct might look like this: Ask data scientists to list out all the data they’ll ever need, and create batch jobs to serve up those needs. A better solution would be to find a way to connect multiple types of data sources to a single environment, allowing data scientists to explore data as it becomes available. Then create features and make them available for production without having to change development environments or languages or to be proficient in reading crystal balls to know what data they’ll need into the future.

Continuously Monitoring the Wrong Things

Another common pitfall of applying MLOps without focusing on the people your ML models will ultimately impact is using continuous monitoring only to measure performance, without tracking the impact of your features across various demographics. It’s not enough to check—one time—that your models aren’t systematically disadvantaging marginalized groups. We need to do this continuously.

People change, so do our biases and patterns. It’s easiest to conceptualize this for me in our use of language over time: The words are the same but their impact changes. We often don’t know that the underlying data has changed because society itself has evolved. But we can commit to measuring the impact across dimensions to react quickly, ultimately reducing harm.

Ultimately, MLOps isn’t just about automation. It is also a set of principles to guide how we build tools for sharing outcomes. At my company, we are thinking about ways to help identify and flag data and features that could introduce bias in our feature store product.

The gains from efficient ML development are significant. Data scientists and data engineers are sharing new tools to speed development and foster collaboration at higher velocity. And with better tools and processes, we are able to share responsibility for productionizing ML. MLOps, at its best, incorporates these components and more into your organization.

If we’re not careful, however, teams will focus on making data scientists more like software engineers, vendors will create tools that fail to address the entire problem, and companies will spend millions of dollars on quickly releasing models that fail to serve their customers—ultimately at the cost of humanity.