Tiering Cold Data to the Cloud Without Tears

The cloud provides inexpensive storage. But like everything else in the cloud, there are hidden costs. Cloud storage providers (CSPs) charge to put data into and retrieve data from the cloud. They charge for the API calls and they generally charge an egress cost when the data is extracted from the CSP. So, to keep enterprise storage costs low, infrequently accessed data such as snapshots, logs, backups and cold data are best suited for tiering to the cloud.

By tiering data, on-premises storage arrays need to only keep hot data and the most recent logs and snapshots. Typically, 60% to 80% of enterprise data has not been accessed in over a year. By tiering the cold data as well as older log files and snapshots, the capacity of the storage array, mirrored storage array (if mirroring/replication is being used) and backup storage is reduced dramatically. This is why tiering cold data can reduce overall storage costs by as much as 70%.

The many advantages of this approach include:

Lower acquisition cost. Flash storage, used for fast access to hot data, is expensive. By tiering off infrequently used data, you can purchase a much smaller amount of flash storage, thereby reducing acquisition costs.
Lower backup and mirroring costs. By continuously tiering cold data, you can reduce your footprint, license costs and storage costs for backups and replication if the cold data is placed in robust storage (such as that provided by the major CSPs).
Improve performance and lower capacity of your storage array. By running storage at a lower capacity and by moving access to cold data to another storage device or service, you can increase the performance of your storage array and get by with a smaller storage array.
Enable processing of cold data without burdening the storage array. Processing and feeding your cold data into your AI/ML/BI engines is critical to staying competitive. With cold data in the cloud, you’re also reducing the load on your storage array, thereby extending its life.

Know First, Then Tier

A key challenge of cloud tiering is determining what data to tier. End users should not have to decide which data to archive or handle the management of shared files. This issue may result in IT organizations keeping too much data in expensive, hot storage. One way around this is to automate the process through business policies dictating when and how the process is applied. It’s also helpful to provide transparency of the data by keeping it in the existing namespace. Users can still find the data and access it as if it had never been tiered. Transparency enables IT to tier cold data continuously and systemically across all storage devices in the entire organization without requiring end user assistance.

To help IT make the right decisions, analytics is needed to see just how much data you have, and how much has not been touched in three, six or 12 months. It will be even more compelling if IT can run what-if scenarios to determine how much they will save with cloud tiering. The last thing you want to do is tier data in the dark. Take time to understand your data assets before you invest in the effort.

Here are other considerations, beyond analytics, when embarking on a cloud tiering initiative:

Transparent, continuous tiering. Tiering should be transparent to the user. If someone can still search for and access data that’s tiered without any change to the experience, IT can realistically tier data systemically across the enterprise. Without transparent tiering, it’s an onerous process to tier data because you will need user permissions.
Flexible tiering policies. You need the flexibility to exclude certain data sets, include others and select from a large range of data ages. Be aware that many tiering solutions provide an extremely limited set of policies.
Tiering across multi-vendor storage arrays. Most enterprises have storage arrays from multiple vendors, and each appliance will have different tiering solutions if any. This makes it difficult to roll out a consistent, global tiering policy across the organization. Look for a solution which can encompass all the storage devices you have today and might acquire tomorrow.
Fast access to tiered data. Accessing data from the cloud will incur a much higher latency than accessing data on the local storage array. But once the cold data has been retrieved you can cache it locally to eliminate future latency and reduce egress costs. If the data set is large, the tiering solution should stream it so that users can access it even while the rest of it is being recalled.
Native cloud access to tiered data. The tiering solution should allow access to the cold data directly using the cloud storages native access tools. For instance, if you tier cold data to AWS S3, you should be able to access the data directly from AWS using a standard S3 browser like CloudBerry. Unfortunately, most storage array tiering devices store the data in a proprietary format, meaning that you can only access it from the source storage array.

Cloud tiering is a practical and easy step on the path to the cloud. When done right, it gets you into the cloud while reducing storage costs. There are new solutions available that make this approach seamless with no disruption to users and your existing data protection workflows. By running an analysis of data assets across your storage ecosystem, and setting up policies for migration of cold data, you can ensure that data is always living in the best places from both a cost and business perspective.