Unless you are an alien life form who just beamed down from space, you’ve heard all about the why of moving data workloads to the cloud. The cloud is a more scalable, more flexible, more available and possibly less expensive solution for storing and processing data.
What is not always obvious is the how of moving data workloads to the cloud. In several key respects, cloud-based data workloads are fundamentally different than those hosted on-premises.
Understanding those differences is critical for moving data workloads to the cloud effectively. While it’s true that the cloud offers a variety of benefits for hosting data workloads, actually realizing those benefits requires a plan for managing data effectively once it is in the cloud.
Let’s take a look at what that entails.
Let’s start by identifying the main ways in which cloud-based data workloads differ from those hosted on-premises:
We could/should add a section here about assessment/planning before the actual migration. We have a bunch of published material on this.
How do you manage the special challenges that cloud-based data workloads impose? Following are some tips.
As noted above, data migration into the cloud is a common pain point. Given this, and the fact that your migration strategy lays the foundation for the ongoing success of your data workload, it’s important to perform a migration assessment before beginning your migration.
A migration assessment involves determining not only how to get your data into the cloud, but also how you will adapt your data architectures and strategy to fit the cloud. Will you simply lift and shift data into the cloud, keeping the same general architecture in place? Or will you modernize your data workloads by taking advantage of new technologies you didn’t use in your on-premises environment?
The major cloud vendors offer a suite of tools to help answer these questions:
The network can be the biggest bottleneck in a data pipeline. Given the centrality of networks to the cloud, it’s therefore important to ensure you leverage the network efficiently.
One way to do that (especially during the data migration stage) is to take a provisioning approach, which involves moving data gradually into the cloud and provisioning cloud infrastructure to fit each specific data workload in a cost-efficient way. This can be more cost-effective than lifting and shifting all of your data into the cloud at once, which can leave you with hefty data migration fees.
One of the best ways to maximize data availability in the cloud is to replicate it across multiple cloud data centers and/or regions. Doing so maximizes the chances of keeping your data available in the event that one cloud data center fails, which is rare but does happen.
Of course, the more data replication you use in the cloud, the more you will pay, so you need to balance your replication strategy with your budget.
Most cloud vendors offer different tiers of storage. The default tier is designed for data that needs to be accessed on a frequent basis, but you can save money by choosing tiers designed for infrequently accessed data or archival storage. The price you pay for using lower-cost storage tiers is slower data access time, but in cases where you don’t need to access data quickly or frequently, low-cost tiers can do a lot for your budget.
In the cloud, you often have little control over underlying infrastructure or software environments. You can’t harden the host operating system for your cloud servers and the metrics you can feed from a cloud environment into your SIEM are often limited.
What you can do to help secure most cloud environments, however, is to set strict access controls. Configuring IAM policies may not be most engineers’ idea of a good time, but it’s critical to perform this tedious work to help secure cloud-based data workloads.
When you move data workloads to the cloud, it can be tempting to assume you don’t need to back them up, because the cloud never disappears.
It’s true, the cloud is very resilient, but that doesn’t eliminate the need for data backups. There is always a chance—however small—that cloud-based data could be lost permanently. You also face the (less uncommon) risk of having your cloud-based data become corrupted or infected with malware—in which case it’s useful to have clean backup copies from which you can restore.
So, back up your cloud-based data, too. As a best practice, it’s wise to follow a 3-2-1 backup strategy, which entails keeping multiple backups of your data in isolated locations.
The cloud is a fundamentally different beast than your on-premises infrastructure. Making the most of cloud-based data requires you to rethink your strategies for data migration, availability, security and more in certain respects. You need not discard everything you know about on-premises data management, but you need to adjust your operations to address the special challenges of the cloud.
This sponsored article was written on behalf of Unravel.
A Tricentis survey found organizations could see massive costs savings by fully automating mobile application testing.
The history of DevOps is worth reading about, and “The Phoenix Project,” self-characterized as “a novel of IT and DevOps,”…
The rise of low-code/no-code platforms in DevOps is reshaping the way software is developed and deployed.
By investing in open source frameworks and LGTM tools, SRE teams can effectively monitor their apps and gain insights into…
Cognition Labs' Devin is creating a lot of buzz in the industry, but John Willis urges organizations to proceed with…
While most app developers work for organizations that have platform teams, there isn't much consistency regarding where that team reports.