Monitor Toolchain Health, Too

A funny thing happened on the way to being agile. We adopted a ton of tools to make us more responsive. They really do help us deliver software quickly and efficiently, and they have improved IT processes along with our specific process improvements.

They also created a toolchain that resembles areas of responsibility of old. That makes perfect sense; we still need to store source code, build applications, deploy somewhere, worry about security, etc. And we need to do all of that faster, which is where automation and the toolchain really shine.

But you need to keep in mind the overall impact of the toolchain upon the development process. Now the organization is dependent upon those tools. We went from slow and steady to fast and with a large series of steps along the toolchain. Things are better, we’re delivering quality software at a faster pace, but in some ways, the process risks becoming more fragile. If a tool in the chain shows signs of trouble, you need to be aware of it and on top of the issue almost immediately. The price of ignoring issues in the process can be catastrophic disruptions in software delivery. Losing a CI tool is akin to shutting down customs at an international airport: The source of the problem is big, and the downstream problems are show-stopping.

So while you are tooling your applications, make certain you are tooling the build chain, also. Know when a build that always takes ten minutes suddenly spikes to triple that. Know when a test suite delivers non-breaking errors at an increased rate. Know when space is low on a shared drive that the build tools use. Tool everything, because the software delivery toolchain is now the core of application development and deployment.

Tooling these systems can also offer a warning of problems in a given application. It is possible that the increase in build time is because there are a lot of previously unused libraries being included … and that those libraries have not been properly vetted by security. It is possible that the disk space on that shared drive has PII test data that needs to be cleansed. There is a lot of good that can come from over-tooling the build chain, and unless you have a bit of tooling that slows the process, there is no downside. It won’t take too long to build a list of what to watch, and modern DevOps tools have enough reporting and APIs that it won’t take long to begin watching those items, either.

That is where the people part comes in. Don’t ignore the tooling once it is in place. Build a process to regularly review results and take action when necessary. No tooling is useful if the results are ignored; sadly, we have a long history of spectacular failures that started with, “We saw indications, but didn’t think it was important at the time …” So act on results, even if that just means flagging them for review in the next reporting cycle—just don’t ignore them.

And keep rocking it. The organization thrives on the software you build, deploy and support. It is a massively complex system that has IT staff at its heart, so keep it thriving and watch for indicators of illness in the core toolchain.