Data is the lifeblood of modern business. Nearly everything we interact with in our digital lives involves data management in some form. And large companies often deal with many unoptimized data-related operations, such as inefficient queries, un-optimized data workloads or large ungoverned data lakes.
I recently chatted with Capital One Software’s VP & Head of Slingshot Engineering, Salim Syed, on optimizing data operations to maximize efficiency. Below, we’ll consider how data management changes when migrating to the cloud and review some actionable ways technical leaders can direct more efficient DataOps.
Transitioning to Cloud-First Thinking
Arguably, the first and highest hurdle to overcome when optimizing data operations is transitioning from an older, on-premises way of thinking about data management to a newer, cloud-first model, Syed explained, because the two paradigms couldn’t be any more different.
Simply replicating on-premises behaviors could be bad for managing databases in the cloud because it enables unlimited power compared to on-premises computing. “When you go to the cloud, the shackles break,” said Syed. Cloud technologies allow for infinite scalability, but such power comes with a significantly higher degree of responsibility.
Another potential holdover is the organizational structure. Having a centralized team oversee a lot of controls could end up stifling innovation, he said. Yet, federating too many platform teams could have a siloing effect. Therefore, he recommended defining policies and regulations centrally and making them easy to enact with self-service tooling.
Ways to Optimize DataOps For Efficiency
Capital One was an early mover to the cloud. Syed explained that Capital One started its tech transformation journey in 2012 and exited its last data centers in 2020. It’s no surprise that, as a cloud-based digital financial company, they deal with extremely large amounts of data. For example, thousands of analysts rely on financial data and other sources to generate their insights.
Therefore, the company must constantly balance empowerment with operational efficiency for interacting with large quantities of data, Syed said. With this background, he shared some tips they’ve been using to enhance efficiencies within their data-based operations.
Avoid Inefficient Queries
Database queries used to have a minimal impact on resource consumption when you owned your infrastructure. But this all changes in the cloud era. For instance, poorly written, inefficient queries programmed to scan billions of rows could result in unnecessary computing effort and high cloud spending. Therefore, designing queries with smarter ranges can help avoid waste.
Syed shared that, at Capital One, his team has developed a query advisor that can look at prior patterns and give recommendations, such as fixing the range, joining queries or reducing the number of rows. By examining past query patterns and looking at the related metadata, the system can advise better-constructed queries that improve efficiency.
Optimize the Datasets
In addition to optimizing queries, database engineers should keep an eye on optimizing the datasets themselves. Because how you structure your tables and the data model itself is even more critical in the cloud world, said Syed.
Furthermore, the access pattern should be given forethought. Although some scenarios require real-time processing, it can be costly and isn’t always necessary. Therefore, Syed recommended understanding your access pattern for loading data and its associated cost implications.
Have a Data Retention Strategy
Next, it’s easy to let data accumulate over time. But, having a ‘set it and forget it’ policy for data is not a wise strategy for the cloud. Instead of letting petabytes of information accrue and produce excessive storage fees, it’s best to prune your data lakes where possible so they aren’t retained indefinitely, said Syed, and set policies for storage retention and deletion timelines upfront.
Monitor For Inefficiencies
A certain degree of inefficiencies with data operations will be inevitable, acknowledged Syed. For instance, this may be an instance accidentally left running or provisioning large clusters to do basic functions. Nevertheless, there are certain actions engineers can take to remediate inefficiencies as they arise. Part of this is identifying inefficiencies in real-time and correlating them to actionable changes. He encourages leaders to create learning opportunities out of these events and to spread internal knowledge.
Match the Technology to the Use Case At Hand
Lastly, and perhaps most importantly, are you using the right technology for the use case at hand? There are many types of database styles to choose from these days, from relational databases like MySQL or PostgreSQL to non-relational databases like NoSQL or MongoDB. Matching the right data technology to the use case at hand will be an important step toward improving efficiency.
Furthermore, Syed recommended right-sizing the compute for the workload as well. For instance, for his team at Capital One, Syed explained that the policy is not to have large-size computing in lower-level environments, as you shouldn’t need to process a lot of data there.
Smart DataOps Aids FinOps Objectives
The above strategies could be adopted to avoid wasted resources. Moreover, these tactics to optimize data processes could also aid FinOps, which seeks to optimize operations to reduce IT expenses. But reviewing data efficiencies should be done often—you have to be vigilant as new workloads are introduced, Syed cautioned.
Image source: joshua-sortino-LqKhnDzSF-8-unsplash.jpg