Blogs

Why Object Storage is Best for Cloud-Native Apps

A crucial question that plagues cloud application developers is, “What kind of storage should we use for our app?” Unlike other choices like compute runtimes—Lambda/serverless, containers or virtual machines—data storage choice is highly sticky and makes future application improvements and migrations much harder.

All three hyperscalers have storage services that present block, file and object-based data access. Each of these storage services are mature and offer different advantages, making the choice even harder. Though block and file-based storage has existed for multiple decades, in this article we will illustrate some key differentiators that should make object storage your default storage choice for new applications written in the cloud.

Storage Type/Cloud Provider Google Cloud Amazon Web Services Microsoft Azure
Block Persistent Disk Elastic Block Storage Disk Storage
File Filestore Elastic File System Files
Object Cloud Storage Simple Storage Service (S3) Blob Storage

Easy Scaling

Scalability is an important requirement for most cloud applications. It is expected that horizontally increasing the amount of compute power available to an application increases its ability to process requests, users, etc. to handle peak workload. Most cloud providers also make it easy to scale up compute resources to meet peak demand. As compute resources are scaled,
block and file-based storage need to be mounted/attached to the new compute instances.

However, a cursory search will show that these operations can fail or even hang indefinitely for multiple reasons. They also are often hard to debug. The other issue with using file and block storage solutions is that teardown of the compute instance may fail or hang for the same reasons. These issues immediately negate the application’s ability to scale freely as required. This is, however, not an issue with object-based storage, since there is no mount step involved. Your object storage is instantly accessible to the newly-created compute instances.

Sharing and Consistency

Data sharing and consistency are where object storage really shines compared to other storage types. In both block and file-based data storage, one instance of an application can end up seeing partial data written by another instance. Application developers end up having to use persistent locks to get around this issue. However, such schemes come with their own sets of challenges: Performance, correctness, etc. Persistent locks end up making an application severely complicated; I have seen even experienced storage engineers make mistakes while using persistent locks. Object storage avoids this problem by not exposing partially-written objects or objects actively being written. Also, note that objects are typically immutable, so once written they can only be overwritten as a whole and not in parts. This means updating data requires expensive read-modify-write cycles. However, most cloud providers help avoid these additional reads via special APIs that can create an object from portions of an existing object like GCS compose, Azure’s put page blob and AWS multipart upload.

Data Protection

Errors are bound to happen during application development or rollouts. These errors can end up impacting critical data and potentially disrupt normal application operations. This is why it is essential to have some sort of backup/snapshots configured on the storage that you use.

Though most storage services have some form of backup/snapshot mechanism, most don’t make it very easy to configure or restore from them (that is, both require multiple steps or the involvement of a cloud administrator). All cloud object services support native data/object versioning capabilities which are extremely easy to enable. So, basically anytime an application updates and/or deletes, the object storage service preserves the older copy of the data. In case an older copy of the data needs to be restored, you can just read the old version and write it as the new object. The careful reader might see that if an application writes/deletes data often, there may be a lot of older versions of the data left behind. One might think these would be hard to identify and remove when not needed. However, all cloud providers support policy-based data life cycle management (see next section) so you can set up policies to delete unnecessary copies. Note that object versioning also provides an excellent defense against ransomware attacks.

Policy-Based Data Life Cycle Management

The amount of data being generated by applications is only going to keep increasing with each passing year. This is the reason all the cloud providers support policy-based data life cycle management for their object storage services. Even if you don’t expect to use more than a few gigabytes of data, policy-based life cycle management can help you keep your storage costs in
check and can help reduce your code complexity around handling application crashes. Policy-based life cycle management especially comes in handy when you or your cloud admin decide to enable features like object versioning and object holds for data protection and compliance reasons. These policies are very simple to set up and can easily be customized to the needs of
the individual organizations/applications/developers requirements.

Conclusion

As one can see from the above, object storage services have been built to enable the development of simple and scalable applications. So, if you are writing a new application from scratch, choose object-based storage to keep your applications simple and easy to maintain.

Ajaykrishna Raghavan

Ajaykrishna Raghavan is currently a Technical Lead/Staff Engineer in Google Cloud Storage. GCS exposes an object-based storage interface to customers using GCP. It is also responsible for managing data storage for Gmail, Google Photos, etc. Raghavan previously held senior technical roles at Nutanix, Datos (acquired by Rubrix) and NetApp. He has over 10 years of professional and research experience in distributed systems, distributed storage systems, enterprise storage, software engineering, and leading large engineering teams to deliver results. The products he built in the past resulted in millions of dollars in revenue. His R&D work in these organizations yielded six patents granted by the USPTO and he has six pending applications. He has published papers in top tier conferences, and also reviewed papers for multiple conferences.

Recent Posts

Building an Open Source Observability Platform

By investing in open source frameworks and LGTM tools, SRE teams can effectively monitor their apps and gain insights into…

58 mins ago

To Devin or Not to Devin?

Cognition Labs' Devin is creating a lot of buzz in the industry, but John Willis urges organizations to proceed with…

2 hours ago

Survey Surfaces Substantial Platform Engineering Gains

While most app developers work for organizations that have platform teams, there isn't much consistency regarding where that team reports.

18 hours ago

EP 43: DevOps Building Blocks Part 6 – Day 2 DevOps, Operations and SRE

Day Two DevOps is a phase in the SDLC that focuses on enhancing, optimizing and continuously improving the software development…

19 hours ago

Survey Surfaces Lack of Significant Observability Progress

A global survey of 500 IT professionals suggests organizations are not making a lot of progress in their ability to…

19 hours ago

EP 42: DevOps Building Blocks Part 5: Flow, Bottlenecks and Continuous Improvement

In part five of this series, hosts Alan Shimel and Mitch Ashley are joined by Bryan Cole (Tricentis), Ixchel Ruiz…

19 hours ago