Modernizing a data warehouse ranks among the worst imaginable tasks for IT. The end result is immensely desirable. The benefits of moving from an on-premises to a cloud data warehouse are well-documented. But database migrations require huge efforts and the stakes are high. What makes it so difficult? All applications, including ETL, BI, analytics and reporting, are designed to work with a specific legacy system. Hence, moving these applications to a different system requires extensive rewrites.
Database migrations may well be the world’s most underestimated problem. In particular, data warehouse migrations are notorious for their horrendous cost overruns. Even when organizations plan for these projects to take several years, they tend to miss even the most generous deadlines. Worse, the vast majority fail to meet their objectives.
In this article, we report on the exceptional success one major enterprise had migrating its systems. They managed to modernize their system in record time with little to no downtime at all. We’ll take a look at their journey and what sets this organization apart from their peers.
The Big Challenge
They don’t get much bigger. The system in question was long regarded as one of the largest, most complex data warehouses in EMEA. It is the backbone of a market-leading integrated container logistics and shipping company.
The company’s 165TB of highly compressed data is used to track goods and shipping vessels around the world. Uninterrupted availability 24×7 is critical for this kind of operation, of course. As the central data hub, this system connects thousands of users that execute over one million queries every day.
Migrating the workloads off an aging Teradata system to Microsoft Azure Synapse was key to modernizing the data estate. IT leadership found itself faced with the question of whether it would be possible to move without disrupting their business users.
Database Virtualization is the Game Changer
Database virtualization (DBV) is the extension of server virtualization to the database stack. Over the past two decades, we’ve seen a similar development in other spaces too. For example, networking and storage have been completely transformed by virtualization.
The principle of DBV is as simple as it is powerful. At its core, it is software that sits between the applications and the database and translating between the two in real-time. Applications connect to the DBV component instead of the database and DBV provides the same language surface and APIs as the legacy system. DBV converts and optimizes all queries and data formats for the new cloud database. Applications can’t tell the difference between the original system and the virtualized one.
Viewed in the context of migrations, DBV eliminates the most labor-intensive tasks. Instead of rewriting applications over the course of years, point them to DBV.
Planning the Implementation—There is no ‘Easy’ Button
If DBV eliminates the need for a rewrite, doesn’t that make database migrations easy? Indeed, it makes it simple. But easy it is not. Remember, DBV is only the runtime. As with other kinds of virtualization, the implementation requires extra effort.
DBV, in this case, had over 99.75% coverage. In theory, there is no limit to what DBV can emulate. In practice, developers of DBV systems have to weigh each feature request. In our case, of the 0.25%, the customer decided to redesign the application that was outside of the current coverage.
Another decision criterion is the choice of the destination system. For DBV to be successful, the new destination must be powerful enough to stomach the incoming workloads. This seems obvious, yet the failure of the Hadoop craze of the past decade showed it isn’t. In our case, the customer chose Azure Synapse, which was perfectly capable of processing these workloads.
The biggest task during an implementation, though, is the data transfer itself. As a runtime, DBV relies on the data transfer to the new system being complete.
People and Processes Are Your Biggest Obstacle
Conventional migrations are the open heart surgery of the IT world. In contrast, DBV is a non-invasive procedure, but heart surgery, nonetheless. Implementing DBV is a team sport.
To be successful, everybody must pull in the same direction. Executives must set the course and, once underway, relentlessly cheer on the team. Developers must refrain from wanting to rewrite code “just because.” Everybody else must execute in their clearly defined roles.
Divide and conquer is the key principle. By structuring the implementation in tranches, the scope remains manageable at all times. Moving a limited number of applications at one time reduces distraction and keeps the team focused.
Then, there are the naysayers. These are members of the organization who want to preserve the status quo. And finally, there is the legacy vendor. To them, DBV is a clear threat and they will do anything in their power to stop your project.
Upfront alignment across the organization keeps the team on course even when there’s pushback.
Move Fast to be Successful
Because a data warehouse is constantly under development, it remains a moving target throughout the project. The longer the project takes, the more complex things tend to get.
For each workload tranche, the team needs to execute two distinct steps. Check for functional correctness, then tune the performance of the workload. The latter concerns physical design choices such as indexes or table geometries.
Rarely do queries need to be adjusted.
At the end of the project, the team needs to catch up the data transfer so the new system is fully equivalent. Catching up on data transfer between tranches reduces the load in the home stretch.
The time needed to implement the full virtualization of a system varies. For a mid-range system, this can be a calendar quarter; for a large system, it could take six to 12 months. In our case, the full implementation took 10 months.
A New Way of Modernizing Data Estates Emerges
What is most remarkable was the final cutover from the old to the new system. Our customer made the transition without telling their thousands of business users. Instead, they flipped the switch on a weekend, repointing all business users to the new stack.
Business users coming back on Monday noticed their applications mysteriously running faster. But they had no idea of the magnitude of the change. That is unheard of. Even simple database upgrades cause more downtime.
Looking back, how does this new way of modernizing a data estate compare to the conventional approach?
Instead of 10 months, a conventional approach would have taken years. Instead of a small agile team of about 20, it would have required hundreds or even thousands of people to make changes. Coordination alone would have made the cutover a project for several calendar quarters.
In the end, the customer saved an estimated nine figures on their five-year TCO over a conventional approach, assuming that a conventional approach would have succeeded at all. Given the high failure rate of conventional migrations, this is a very optimistic assumption.
This project can serve as a blueprint for data warehouse migrations. Whenever time, cost and risk are of the essence, DBV is a strong contender.