The cloud is becoming a large cost center for many organizations. Though often touted as a cost-saver—and while many cloud migrations are driven by the desire to conserve IT costs—the cloud does create operating expenses at the same time it cuts down on capital expenses. Moreover, unlike traditional IT investments, cloud costs can be unpredictable and quickly spiral out of control.
However, while the cloud creates problems, it also gives you the tools to address them. For example, on AWS, a primary component of costs is object storage delivered via the venerable Simple Storage Service (S3). S3 is the basis for modern data lakes, huge-scale media libraries, website content, unstructured content repositories and more.
Fortunately, S3 provides a storage tiering mechanism that can significantly reduce costs for content that is infrequently used. To make effective use of storage tiering, you’ll need to adopt the concepts of data classification—automatically discovering your infrequently accessed data and moving it dynamically to the most appropriate storage tier.
What is Data Classification?
The term ‘data classification refers to the practice of organizing unstructured and structured data into categories representing different data types.
Data classification is an organizational practice that can help you achieve two key goals:
- Learn what data types you have stored
- Determine where each data type is located
Here are several data classification use cases defined by Gartner:
- Control intellectual property (IP)—Data classification helps you control access to data and define suitable locations.
- Diminish the attack surface—You can use data classification to reduce the attack surface in locations storing sensitive data.
- Identify data governed by regulations—Data classification enables you to categorize data into specific compliance categories, ensuring you meet requirements set by relevant regulations like GDPR, HIPAA and PCI DSS.
- Provide access to content—Use data classification to configure access according to data type, usage and more.
- Remove data redundancies—Use data classification to identify and remove redundant or stale data.
- Optimize business activities—Data classification can help you set up metadata tagging to optimize your different business activities.
- Learn your data patterns—Discover information on data usage and location.
What is Amazon S3 Intelligent-Tiering?
S3 Intelligent-Tiering is a cloud storage class that monitors access patterns to determine the optimal storage tier for each object. It automatically transfers less frequently accessed objects to access tiers with lower costs.
Storage tiering is a key strategy for conserving cloud costs. Intelligent-Tiering is an automated storage tiering mechanism, which can save storage costs when data access patterns evolve without influencing performance or creating operational overheads.
Key use cases for Amazon S3 Intelligent-Tiering are:
- Data with changing access patterns—Unpredictable, changing or unknown access patterns, independent of the retention period or object size.
- Infrequently modified data—You can use this storage class as the default for almost any workload that is not frequently modified, particularly data analytics, data lakes and user-generated content (UGC).
Note that S3 Intelligent-Tiering comes at an extra cost-per-GB/month in addition to the regular S3 GB/month storage costs and the data transfer fees related to each storage tier.
How S3 Intelligent-Tiering Works
S3 Intelligent-Tiering monitors and predicts user behavior based on machine learning algorithms, allocating stored data to the appropriate access tier. These algorithms are trained on data traffic patterns across trillions of objects stored in S3. Intelligent-Tiering allows management on a granular object level, moving each object into a different storage tier when its access pattern changes.
If an object remains unaccessed for 30 days, S3 Intelligent-Tiering automatically relocates it to an infrequent access tier. If you have not accessed certain objects for 90 days, AWS moves them to the archive instant access tier. To achieve even lower storage costs, you can decide to use the archiving tiers which can be accessed in minutes or hours.
Here are S3 Intelligent-Tiering access tiers explained:
- Frequent access tier (automatic)—This is the default access tier. Here, any object starts its life cycle once created or moved to S3 Intelligent-Tiering. An object stays in this tier as long as it is accessed continuously. The frequent access tier offers high throughput performance and low latency.
- Infrequent access tier (automatic)—If an organization does not access an object for 30 (consecutive) days, AWS moves the object to an infrequent access tier. This tier offers high throughput performance and low latency.
- Archive instant access tier (automatic)—If an organization does not access an object for 90 consecutive days, AWS moves the object to the archive instant access tier. The archive instant access tier offers high throughput performance and low latency.
- Archive access tier (optional)—You may activate the archive access tier for asynchronously-accessed data. After you select to activate this tier, the archive access tier automatically archives any objects an organization has not accessed for 90 days. You can adjust the last access time before archiving, specifying up to 730 days.
- Deep archive access tier (optional)—For data that you access even less frequently, you can enable the deep archive access tier. Once you activate this tier, Intelligent-Tiering moves any objects the organization has not accessed for at least 180 consecutive days to a deep archive. You can specify the last access time before archiving as up to 730 days.
How to use Intelligent-Tiering to Cut Costs
The S3 Intelligent-Tiering storage system uses automatic storage class optimization to minimize storage costs. The Amazon S3 API, AWS CLI and AWS Management Console allow you to set S3 Intelligent-Tiering to automatically archive asynchronously accessed data.
How to Move Data to S3 Intelligent-Tiering
You can move data to S3 Intelligent-Tiering using a PUT request to transfer the data directly. Alternatively, you can set up S3 life cycle policies that move objects from a standard S3 storage class to S3 Intelligent-Tiering.
To upload data directly to S3 Intelligent-Tiering with a PUT operation, specify Intelligent-Tiering in the x-amz-storage-class header. For example, the following PUT request stores an image in an S3 bucket using Intelligent-Tiering:
PUT /image-for-classification.jpg HTTP/1.1
Host: myBucket.s3.<Region>.amazonaws.com (http://amazonaws.com/)
Date: Fri, 7 Jan 2021 18:15:00 GMT
Authorization: <your-authorization-string>
Content-Type: image/png
Content-Length: 15342
Expect: 100-continue
x-amz-storage-class: INTELLIGENT_TIERING
How to Enable S3 Intelligent-Tiering Archive Tiers
Archival tiers (i.e., Archive Access, Deep Archive Access) allow you to store data cheaply and access it within minutes or hours. You can use the management console CLI or S3 API to create a bucket, object tag-level configuration or prefix to activate an archive access tier (or both). You may activate either archive access tier (or both) by creating buckets, prefixes or object tag level configurations via the AWS Management Console, Amazon S3 API, or AWS CLI.
To enable automatic archiving with S3 Intelligent-Tiering via S3 Console:
1. Open the Amazon S3 console via the AWS Management Console.
2. Select a bucket in the Buckets list and select Properties.
3. Go to S3 Intelligent-Tiering Archive configurations and select Create configuration.
4. Enter a descriptive name for your configuration in the Archive configuration settings.
5. Select the configuration scope you want to use in the Choose a configuration scope section. You can also choose to restrict the configuration scope to specific objects in the bucket—use a shared prefix or object tag (or both):
Limit the configuration scope by selecting Limit the scope of this configuration with one or multiple filters.
Under Prefix, specify a single prefix to define the configuration scope.
To use object tags to define the configuration scope, select Add tag and specify the value under Key.
6. Select Enable under Status.
7. Go to Archive settings and select the Archive Access tier you want to enable. You can select both.
8. Select Create to set the configuration.
Image Source: AWS
Conclusion
In this article, I showed how data classification concepts, specifically applied through the Amazon S3 Intelligent-Tiering mechanism, can help you dramatically reduce storage costs on AWS. This is only one example of cloud savings driven by data classification and machine learning, and you can extend it to other parts of your cloud deployment:
Leverage AI analysis services like Amazon Macie to identify sensitive data like personally identifiable information (PII) and move it to the most appropriate storage medium or storage tier.
Use tools like Amazon Trusted Advisor to automatically identify unused or underutilized cloud resources—such as block storage volumes and snapshots.
Extend your analysis to include on-premise environments and other clouds—to determine which environment is the most appropriate for your data.
I hope this will be useful as you improve your visibility and control over storage costs in the cloud.