A crucial question that plagues cloud application developers is, “What kind of storage should we use for our app?” Unlike other choices like compute runtimes—Lambda/serverless, containers or virtual machines—data storage choice is highly sticky and makes future application improvements and migrations much harder.
All three hyperscalers have storage services that present block, file and object-based data access. Each of these storage services are mature and offer different advantages, making the choice even harder. Though block and file-based storage has existed for multiple decades, in this article we will illustrate some key differentiators that should make object storage your default storage choice for new applications written in the cloud.
|Storage Type/Cloud Provider||Google Cloud||Amazon Web Services||Microsoft Azure|
|Block||Persistent Disk||Elastic Block Storage||Disk Storage|
|File||Filestore||Elastic File System||Files|
|Object||Cloud Storage||Simple Storage Service (S3)||Blob Storage|
Scalability is an important requirement for most cloud applications. It is expected that horizontally increasing the amount of compute power available to an application increases its ability to process requests, users, etc. to handle peak workload. Most cloud providers also make it easy to scale up compute resources to meet peak demand. As compute resources are scaled,
block and file-based storage need to be mounted/attached to the new compute instances.
However, a cursory search will show that these operations can fail or even hang indefinitely for multiple reasons. They also are often hard to debug. The other issue with using file and block storage solutions is that teardown of the compute instance may fail or hang for the same reasons. These issues immediately negate the application’s ability to scale freely as required. This is, however, not an issue with object-based storage, since there is no mount step involved. Your object storage is instantly accessible to the newly-created compute instances.
Sharing and Consistency
Data sharing and consistency are where object storage really shines compared to other storage types. In both block and file-based data storage, one instance of an application can end up seeing partial data written by another instance. Application developers end up having to use persistent locks to get around this issue. However, such schemes come with their own sets of challenges: Performance, correctness, etc. Persistent locks end up making an application severely complicated; I have seen even experienced storage engineers make mistakes while using persistent locks. Object storage avoids this problem by not exposing partially-written objects or objects actively being written. Also, note that objects are typically immutable, so once written they can only be overwritten as a whole and not in parts. This means updating data requires expensive read-modify-write cycles. However, most cloud providers help avoid these additional reads via special APIs that can create an object from portions of an existing object like GCS compose, Azure’s put page blob and AWS multipart upload.
Errors are bound to happen during application development or rollouts. These errors can end up impacting critical data and potentially disrupt normal application operations. This is why it is essential to have some sort of backup/snapshots configured on the storage that you use.
Though most storage services have some form of backup/snapshot mechanism, most don’t make it very easy to configure or restore from them (that is, both require multiple steps or the involvement of a cloud administrator). All cloud object services support native data/object versioning capabilities which are extremely easy to enable. So, basically anytime an application updates and/or deletes, the object storage service preserves the older copy of the data. In case an older copy of the data needs to be restored, you can just read the old version and write it as the new object. The careful reader might see that if an application writes/deletes data often, there may be a lot of older versions of the data left behind. One might think these would be hard to identify and remove when not needed. However, all cloud providers support policy-based data life cycle management (see next section) so you can set up policies to delete unnecessary copies. Note that object versioning also provides an excellent defense against ransomware attacks.
Policy-Based Data Life Cycle Management
The amount of data being generated by applications is only going to keep increasing with each passing year. This is the reason all the cloud providers support policy-based data life cycle management for their object storage services. Even if you don’t expect to use more than a few gigabytes of data, policy-based life cycle management can help you keep your storage costs in
check and can help reduce your code complexity around handling application crashes. Policy-based life cycle management especially comes in handy when you or your cloud admin decide to enable features like object versioning and object holds for data protection and compliance reasons. These policies are very simple to set up and can easily be customized to the needs of
the individual organizations/applications/developers requirements.
As one can see from the above, object storage services have been built to enable the development of simple and scalable applications. So, if you are writing a new application from scratch, choose object-based storage to keep your applications simple and easy to maintain.