This blog is the second in a two-part series on backup and recovery for MongoDB. In one of my previous blogs, I covered why companies require both replication and backup for enterprise grade data protection. And in the first part of this blog series, I discussed the existing solutions for backup and recovery of MongoDB and their drawbacks. Now in this blog, I will discuss the key requirements for protecting data that resides on MongoDB (deployed either on-premises, on private cloud with an as-a-service model or in public cloud with Amazon AWS or Google Cloud Platform).
Requirement 1: Online Cluster-Consistent Backups
Next-generation applications deployed on MongoDB must be always-on. This means pausing the database to make backups is not feasible and the backup operation should not impact the performance of the application. As the application scales, the underlying MongoDB also needs to scale-out to multiple shards. In this case, a backup solution must provide a consistent backup copy across shards without disrupting database and application performance during the backup.
Requirement 2: Flexible Backup Options
Depending on the application, data may have different change rate and patterns. For example, in a product catalog, certain items may be refreshed every day (fast-selling goods, for example), while the others may have longer shelf life, such as premium items. Based on the application requirements, some collections may need to be backed up every hour versus others that may be backed up daily. Providing this flexibility to schedule backups at any interval and at collection level granularity is another requirement we have heard from customers using MongoDB. More importantly, these backups always should be stored on the secondary storage in native formats to avoid vendor lock-in.
Requirement 3: Scalable Recovery
During its life cycle, data resides in multiple stages such as development, test, preproduction and production, and also may reside in multiple clouds (private cloud and public cloud). The topology of MongoDB clusters at each stage is different. For production, the application could be deployed on a sharded MongoDB cluster on-premises, but the test team might have access only to unsharded MongoDB clusters in the Amazon AWS (public cloud). Hence, the backup solution should allow multiple restore operations such as sharded to sharded (such as from 5×3 cluster to 2×3 sharded cluster) or sharded to unsharded (such as 5×3 cluster to 1×3 unsharded) across such cloud configuration.
Requirement 4: Handling Failure
Failures are a norm in the distributed database world. However, the backup solution should be resilient to database process failures, node failures, network failure and even logical corruption of data during backup and recovery operations. Finally, the backup solution should be able to handle failures of MongoDB config servers that store metadata for sharded clusters.
Customers are deploying MongoDB in physical servers, private clouds and microservices such as frameworks, and in public cloud. Backup and recovery should be seamless across these deployment example and the ease of backup and recovery deployment is a big one for MongoDB customers and I will cover this in the next blog. Stay tuned!