Backup, business continuity (BC) and disaster recovery (DR) have all been a critical part of IT for over 30 years—ever since we began relying on technology to run our businesses. Traditional solutions were designed for the world of on-premises infrastructure and structured applications and relational databases. But the world is changing. In the last blog, I talked about the era of digital transformation and the impetus it has on rethinking and reinventing the fundamental backup and recovery architecture for workloads moving to the cloud and applications being born in the cloud.
What Changed? Reinventing Backup and Recovery
Application and data platforms are undergoing the biggest transformation since the dawn of computing. There are several forces at work:
- New applications. The third-generation applications are geo-distributed, scale across multiple systems, always on and typically are deployed in a cloud-first model.
- Existing applications are moving to the cloud. They aren’t going away, but companies are moving some or all of them to the cloud. They still need backup and recovery.
- RPO and RTO windows are shrinking: Enterprises want “always-on” and gone are the days when you could do nightly backups.
- Smaller companies will go all-in on public cloud. SMBs don’t want to be in the business of IT. They’ve been driving the rapid growth of cloud applications and platforms.
- Enterprises will build hybrid clouds. Enterprises will deploy applications and data across on-premises and public cloud environments. Scale, compliance and other factors mean they’ll need to keep some systems on-premises.
- Everyone will use multiple clouds. No one will bet their business on one cloud or one provider. Even now, enterprises are splitting workloads across clouds or cloud and on-premises. Development and test may use one cloud, while the same application might be deployed in a private cloud or a different public cloud.
Cloud’s Impact on Backup, Recovery and Continuity
Cloud gives organizations much more agility, operational savings, and a pay-as-you go model. Public cloud providers can also build much more resilient infrastructure. Amazon guarantees 99.95 percent availability for EC2 and 99.99 percent for S3; S3 is designed for 11 nines of data durability, with multiple availability zones. Because the cloud is so reliable—and cheap—it’s quickly becoming a backup target for on-premises data. But that shouldn’t trick us into believing backup and recovery are “built-in” when we run applications in the cloud. Even Amazon itself recommends backup services for all AWS-native applications and cloud databases.
While service availability and data resiliency addresses infrastructure business continuity and disaster recovery, it doesn’t provide point-in-time recovery or application level intelligence for backup and recovery. As good as the cloud platforms are, they don’t protect against logical errors. And research shows 8 out of 10 errors are logical errors—data corruption, user errors.
Existing Backup Products and the Cloud
As we noted above, traditional backup and recovery products don’t meet the needs of cloud applications—even for existing applications that have moved to the cloud, and not just because they were built in a different era. Cloud and distributed architectures present several other challenges:
- Cloud breaks the media server based architecture of traditional solutions. Cloud applications and data don’t reside on a specific array or disk, so you can’t easily back up what you can’t see. Backups also don’t capture configuration data in the cloud, such as AWS Cloud Formation templates.
- Cloud doesn’t speak the same language. Legacy solutions talk to tape, disk or virtual disks. Backup and recovery in the cloud means integrating with the right protocols, such as the S3 API or Google Cloud Storage.
- Backup appliances can’t be moved to the cloud. Existing backup appliances such as EMC Data Domain or NetBackup that work extremely well on-premises can’t be picked up and moved to the cloud.
- Traditional backup agents won’t scale. If you could get a backup agent running in the cloud, it wouldn’t scale well across dozens or possibly hundreds of nodes.
- VM is not the right layer of abstraction: This is precisely why the core principle of the Datos IO CODR architecture is a scalable application-centric view of data management and data protection that distinguishes it from conventional approaches. This is the exact reason why the CODR architecture introspects application data and uses global semantic de-duplication to achieve storage efficiencies, instead of relying on traditional de-duplication techniques which treat data as an opaque object (such as a VM or a LUN). The benefit of this approach is fine-grained and highly space-efficient data protection that can span clouds over network links.
- Cloud Gateway or Migration Services: Unidirectional only.
Data Protection Must Be Reinvented
The problem of backup and recovery of cloud applications is novel as there are three critical things a cloud backup and recovery architecture should have:
- Elastic compute only. The architecture should scale efficiently on elastic compute instances. There shouldn’t be any CapEx costs for servers or appliances.
- No media servers. Backing up large, scale-out databases requires direct a parallel streaming architecture for data movement between the database and secondary storage. Legacy backup architectures rely on media servers that quickly would become choke points. Direct parallel streaming also allows the data to remain available in native formats.
- Semantic deduplication. Scale-out application databases typically have a 3x replication factor. If you back up individual nodes or even manage to snapshot the whole database, two-thirds of the backup data is redundant. Over time, backups will be 75 percent to 80 percent inefficient without semantic deduplication that works across a distributed architecture.