One of the most interesting developments in security and compliance in recent years is the ability to follow a piece of data through an application from input to consumption and see each bit that touches it. For me, the reason this is so interesting is that it allows postmortems to actually determine exactly what was lost in a data breach. “Oh, that library was compromised? Okay, let’s see what inputs that library actually touches …” It’s so very cool, given the free-for-all that data has traditionally been in applications.
I’m not a huge compliance fan for a variety of reasons, but this is one great result of mandating compliance. I don’t think that we would have this level of data tracking without compliance requirements driving it.
So what can we do with this information? That’s the interesting part that I think we’ll see developing over time. If we know everywhere that a given piece of data is used, we have a powerful bit of development information. It is still early days for implementation–this is not easy stuff–but think about it. I could see a future where all redundant data access is eliminated. If we’re getting the same set of data from multiple sources, and it is being fed to the same internal consumers, then we can standardize the you-know-what out of it. A singular library to standardize formatting, a singular library to standardize look-ups, etc.
For the most part, we currently validate and format on input. Like straight away as the data is brought into the system. That is a great idea for a system where we are uncertain where the data is going, and are sure that some attacks are coming through those inputs. But what if we discovered 10,000 more lines of code to maintain than is necessary? What if data tracing allowed us to eliminate that code by centralizing those functions? We would have to harden the routine doing the validation and formatting–because it will be getting dirty data–but no more so than we are doing literally everywhere today.
If we can reduce source code by using a tool that is designed to improve security, is that not the ultimate shift left? The volume of source code that handles data entering the system is staggering, also. For enterprise systems, most of what we develop is actually data processing, no matter how convoluted it gets. The data store is the central repository and apps are simply accessing and modifying it. So shift left with regard to data is huge.
Consider it. I know that you are all super-busy with other projects, but less source code lightens the load on everyone; take a look at data tracing functionality in products you already own (mostly available in vulnerability management and source code scanning toolsets), and see if it suits your needs. It might save everyone time and will help with some data-centric compliance requirements, too. That leaves you more time to make systems better.