Using Application Deltas in Deployments

Developers have been versioning their source files since the early days of Unix. Anyone else remember SCCS? Back in those days, storage was at a premium: Disks were small and very expensive and it was too inefficient to store a complete copy of each version of the source file. So version control tools would simply store deltas—the changes made from one version of the file to another. This was more efficient; if you only changed one line in a 3,000-line file, the system would simply store the new line and some metadata about where it fit in the file and who made the change.

Some tools would save the latest version of the file and use reverse deltas to record the changes if you wanted to revert to an earlier version. This was usually more efficient, since generally developers want the latest version of a file. As such, it was more efficient to apply deltas to get earlier versions than it was to start with a 3-year-old file and roll forward through 472 changes to get the file you wanted.

These days, disk storage is literally 3 million percent cheaper than it was in 1981. (The storage in my phone would have cost $38 million in 1981!) Because of this, storing deltas is not the panacea it used to be. In addition, CPUs now can crunch numbers faster than they used to, so it’s possible now to use math to compress and store a complete copy of the new version of the file. That’s what modern revision control tools such as Git do.

Now that’s all very interesting, but the title of this post is, “Using Application Deltas in Deployments.” So why am I rattling on about source code versioning?

Well, here’s the rub: While deltas may no longer have much of a part to play in tracking changes to source files, they most certainly do when it comes to deploying applications, and for similar reasons that plagued us (or our parents!) 30 years ago.

Applications are Made of Components

Let’s be frank: An application that consists of a single binary on a single server is not going to need deltas. That would be akin to a change to a single-line source file. All you need to do is to replace the old binary with the new and you’ve deployed. But applications are getting more complex:

The application may be split across multiple servers. An n-Tier application may have different tiers running on different servers (and even different operating systems).
The application could be wholly or partially containerized. You can have multiple containers each running different microservices.
There could be database changes associated with an application change. Indeed, the application change could consist solely of a database change. Database updates are pretty much always deployed as deltas; alter scripts are used to add and remove columns, tables and indexes.

In the same way a single binary can be built from lots of different source files (each of which is under its own version control), an application can be made up of lots of different binary components. Clearly, if the application consists of lots of different components and those components are resident on different servers or containers, then deploying every component every time a new application version is deployed would be wasteful, time-consuming and error-prone.

An application version “delta,” therefore, would track which components have changed between the version of the application being deployed and the versions of the components on the target environment. This allows the deployment tool to only deploy the components that have changed. So what do we need to track to make this work?

Component Versions. Obviously, we need to version each individual component so we know which version of which component is included in any particular application version.
Which component version is present on any particular endpoint within a target environment. That way, if the application we’re about to deploy has version 5 of a component (a DLL or a WAR file, say) and version 5 is already present in the target environment, then we don’t deploy that particular component.

This approach has a number of advantages. You can deploy to an existing environment and only deploy the components that have changed, minimizing the downtime and reducing the risk of failure. You can deploy to a newly provisioned environment and deploy all the components within the application. Then, if you need to fire up a new virtual machine (or a container), the deployment pushes everything. But the next time you deploy to that environment, only the components that have changed are deployed.

Database Deltas

We may need to consider special cases when it comes to database components. Remember we mentioned that database changes are nearly always delivered as deltas, SQL “alter” scripts that make changes to the previous version of the database schema? Well, if all you’re doing is deploying the next version of the application, then that works fine. However, what if you’re deploying a version that’s several releases ahead of what’s in the environment? This can easily happen if the test and production environments are “pulling” deployments into their environment (as opposed to having them “pushed” via a continuous delivery process). Now, just applying the alter script that is associated with this application version won’t work in any environment that doesn’t contain the previous version of the application.

Picture an application (“My Application”) that is up to version 6. It has two components: a WAR file and a database component that contains alter scripts that roll the database schema forward. Here is what the last four versions look like:

So version 3 has a new version of the WAR file; version 4 has a new version of the WAR file and a database alter script that adds a column to a table. Version 5 only changes the database (it amends a stored procedure which uses this new column). Version 6 applies a new change to the WAR file.

Now, in a continuous delivery process, the test rig being targeted will receive each new version. So when version 4 is deployed, the rig will receive version 4 of the WAR file and the alter script will be executed to add the column to the table. Then, when version 5 is deployed, only the DB alter script will run (alter.sql;2) to amend the stored procedure. The WAR file is not deployed, since the server already contains version 4 of the WAR file. When version 6 is deployed, only version 5 of the WAR file is deployed. This is exactly what we want.

But now what happens when we move to user acceptance testing (UAT)? By this stage, we’ve done as much automated testing as we can get away with; in UAT, real users need to log on to our test system to make sure they’re happy with everything. So we’re not going to do a “push.” In UAT, the test lead does a “pull” when they’re ready to accept the new version of the application for testing. So they pull “My Application Version 6” into their test environment and get their testers lined up with coffee and pizza.

But what happens if the test environment is currently on version 3? Well, for the WAR file, there’s no problem—there’s a difference in the component version for the WAR file so the new version is deployed. But what happens to the database? There are no alter scripts associated with version 6 of the application, but without applying alter.sql;1 and alter.sql;2, the database schema is not going to be valid for use with myapp.war;5. The testing will fail, not so much “falling at the first hurdle” as “falling in the paddock.”

Deltas allow us to cure this issue. What we need to do is identify the components that represent the database changes and get them to be applied in sequence for each interim version. In that way, successive database “deltas” are applied to roll the database forward to the correct schema version before the required version of the application is deployed.

So, in our case, going from version 3 to version 6 would deploy the following components:

So we will deploy (and run) alter.sql;1 to add the new column, then alter.sql;2 to amend the stored procedure and, finally, deploy myapp.war;5. Deltas mean we do the minimum required to get the application version to the desired state.

Conclusion

Versioning the components that make up an application means it is easy to determine the deltas (changes) between one application version and another. Recording what version(s) of the components are on the endpoints in a target environment makes it easy to only deploy the components that have changed, minimizing downtime and making the deployment quicker and less error-prone. Identifying the deltas for a database gives us the ability to roll databases forward and jump between releases, which is invaluable when transitioning from an agile “push” to a waterfall-centric “pull” deployment model.

— Phil Gibbs