That Other Continuous: Data Protection

Looking around at things to add to our current project’s toolchain (that is one of the reasons we do these side projects; to check out all sorts of new ideas/tools/etc), and researching the company where a former coworker is the CMO, I was drawn into the resurgent world of continuous data protection (CDP). It is interesting that we could end up adding CDP to the list of ‘continuous’ activities, like CI/CD/… CDP, even though CDP was named long before any of our DevOps or Agile terms were born.

The whole idea of CDP is that it is always backed up, so you are continuously protected. It largely failed when first introduced because it massively increased both the cost and complexity of your storage environment. Essentially, it required a hot copy of all of your storage. Not something most enterprises were willing to invest in. I mean, it was friggin’ expensive. And it used virtualized storage, which no one wanted to mess with. Long story short, at the time, virtualized storage introduced issues no one wanted to deal with.

But several things have changed over the years, and it is interesting to see where CDP concepts have cropped up. You see, today it is being used to protect companies from ransomware. It’s still a bit of complexity, but we’ve become experts at hiding complexity. It still requires a lot of storage (I would argue that the current versions I’m seeing require more storage than the original CDP solutions did), but when your environment is taken over by ransomware, a clean machine can be used to restore data to a point before the ransomware took over. From a “continuous protection” standpoint—giving it a modern meaning along the lines of “continuous testing”—that is an astounding feat.

The way it works—at least the way that the current implementations I’ve looked at work—is pretty simple at heart, though as my former coworker likes to say, “Easy in concept, difficult in implementation.” It keeps every write ever made to storage, with time and currency markers. So it knows what the disk currently looks like, but it can also get to what the disk looked like at 12:33 am yesterday. Or January first, or a year ago… You get the idea.

It uses a ton of space because there is an implied truth here—nothing is destroyed. In traditional storage, if you change a document and hit save, the document is written over. In the new iteration of CDP, if you change a document and hit save, the changes are recorded, but the original is not altered. Call it “non-destructive writes,” if you will.

It is worth checking out. The concepts are huge, and offer protection that is otherwise in short supply. If you have a dedicated DevSecOps team, they’re the ones I’d point at it, since the big reason for using this is a security reason at heart (though there are other uses; I’m currently gushing about security because this is a big deal).

I’ll get back on track and report back about things directly in the toolchain for our newest DevOps project next week. I thought this was worth writing about, since most of you all could use the concept, even if only for peace of mind. You’re rocking it! Protecting your work against the newest range of attackers is just common sense. It’s worth seeing if the modern iteration of CDP is for you.