The Taxonomy of DataOps

Data Operations, or DataOps for short, is one of those IT buzzwords that lots of people use, yet few can define precisely. Like the cloud or DevOps, DataOps doesn’t make sense until you sit down and really think through what it entails.

To that end, let’s take a look in this article at the taxonomy of DataOps—or, in other words, which IT disciplines, areas of expertise and types of work make DataOps happen. I won’t explain everything about how DataOps works (that would require more than one blog post), but I will offer a primer on how DataOps workflows break down.

What Is DataOps?

Before delving into the taxonomy of DataOps, let me first define at a high level what I’m referring to when I use the term “DataOps.”

I use DataOps as a shorthand for data operations of all types—from data collection and storage to data management, monitoring, analytics and performance optimization.

I mention this because when some people hear the term DataOps, they think only of the practices and ideas described in the “DataOps Manifesto” and similar writings, which focus on extending the core principles of the DevOps software delivery philosophy to data operations. I’m a big believer in the “DataOps Manifesto,” but because it is targeted at data scientists rather than the IT community at large, I think it presents a narrow image of what data operations entails.

To me, data operations encompasses the work performed by anyone who works with data in any way. While data scientists and data engineers may perform the bulk of the heavy lifting related to data transformation and data analytics, developers and IT engineers also have important roles to play in data operations. When I use the term DataOps, I’m thinking of them, too.

The Taxonomy of DataOps

With that clarification out of the way, let’s explore what goes into DataOps.

Data Collection

You can’t have data operations without data—and so, data collection is a core part of DataOps.

Data collection refers, of course, to all of the processes by which you accumulate data and store it somewhere, for some length of time.

It’s important to note that data collection can be both passive and active. Much of the data that organizations store and analyze these days is not data that someone goes out and collects by hand. It’s the information produced through the everyday operations of these organizations—their data exhaust, so to speak.

Data Storage

Data storage may seem simple and straightforward. After all, how much do you really need to think about the storage of data? Don’t you just pick a storage location and put your data there?

Well, no. Data storage is much more complex than that, which is why data storage is a distinct discipline within DataOps.

Effective data storage requires finding the right type of storage infrastructure—the one that will deliver the best balance between data performance, availability and cost. It also involves defining data retention policies to determine how long data will be stored. It encompasses data backups, too, which are important for ensuring that the data you intend to keep in storage actually will be available to you if the unexpected happens.

Data Management

Data management is a broad discipline that involves all of the tasks required to keep data available and accessible once it has been collected and stored. It entails work such as transforming data from one format to another to make it compatible with a given set of tools. It also involves tasks such as moving data between storage locations and identifying and resolving bottlenecks within the overall DataOps pipeline that are slowing down performance.

Data Monitoring

In the IT world, the word “monitoring” usually refers to the monitoring of applications or infrastructure for problems. But you have to monitor your data also if you want to do DataOps effectively.

Data monitoring includes tasks such as identifying data quality problems (such as missing data or data formatting issues) that will make your data less reliable or more difficult to work with. It also means checking for signs of failures, such as a failed database or infrastructure crash, that could cause a rupture in your overall DataOps pipeline.

Data Analysis

Everyone (or everyone’s boss, at least) wants to “turn data into dollars,” as Gartner put it. Data analysis, or the process of seeking insights within sets of data, is key to making that happen.

Sophisticated data analytics is the only discipline on this list that is usually performed by data scientists alone, not the rest of the IT team—although that is changing somewhat as more and more simplified data analysis and visualization tools enter the market that makes it easy to make sense of large data sets without having a Ph.D. in statistics or knowing how to program in R.

Data Security

Data security is a big topic, especially in the present age of high-profile data security breaches. Keeping data secure requires a range of processes and tools, from setting the right access control policies for data to monitoring for signs of unauthorized access and scanning data to root out malware or other threats that could exist within data sets.

DataOps is Everyone’s Ops

You might have noticed that many of the tasks described in the DataOps taxonomy above typically are performed by employees of the IT department (or, in some cases, programmers), not data scientists or engineers who specialize in data operations.

This is an important point to drive home because when many people talk about DataOps, they assume that every organization has a team of data experts on staff who handle all work related to data, from collecting and storing it to analyzing and securing it.

In reality, many companies don’t have dedicated data operations teams. And even if they do, the general IT department still plays a huge role in tasks such as configuring tools and platforms for data collection, provisioning data infrastructure and keeping databases running healthfully. Security engineers, too, are key for things such as enforcing data access control and monitoring for signs of data security problems.

Thus, whether your company has a dedicated data operations team or not, DataOps is the responsibility (at least in part) of everyone who does anything related to IT. It’s everyone’s Ops.

This sponsored article was written on behalf of Unravel.

— Chris Tozzi