The larger the project, the more resources are generally dedicated to it. That is true of the vast majority of fields of human endeavor. Consider building the San Francisco Bridge versus building a crossing over a stream. Or World War II versus the Spanish Civil War. The resources dedicated scale with the size and complexity they need to support.
This is true of IT projects also. A four-line script does not require the attention that a highly available web app with dynamic data does. It is a pretty simple concept, we all are aware of it, and methodology doesn’t much change it.
The problem is that we are often terrible about acknowledging this concept when it comes to internal support projects. We don’t see a compiler as a tool in need of regular maintenance and locking down, and largely it isn’t. There definitely are cases where the newest compiler/interpreter should be implemented, but except for massive change such as Python 2.x to 3.x, there isn’t a driving need to have the newest and shiniest.
It’s Complex—and Growing More So
What happens when we apply this relaxed attitude to DevOps? A mess. The reason is simple: The automation side of DevOps is generally implemented with tools. Lots of tools. Tools that require maintenance, troubleshooting, interfacing to other tools … It gets complex fast.
And where on the calendar is time for these things set aside? Almost always the answer to that question is “nowhere.” But these aren’t compilers and interpreters. If your Bugzilla DB goes bad, you have a major problem. Since compilers and interpreters are generally read-only—changes aren’t made to them except to upgrade—it generally requires a disk-level error or serious human failures for them to stop working as planned. That’s not true for things with constantly changing datasets. And many DevOps tools have constantly changing datasets.
In an increasing number of organizations, the DevOps infrastructure should be listed as the largest application portfolio the organization manages. And that is where you need to be.
Make DevOps Tools a Project
It is that simple. Tooling is a massive bit of overhead that seems to be not so big because it streamlines the development/deployment process. In short, the tools save enough time that their cost seems smaller than it is. But that cost will not go away, and once the pipeline is fully automated with a stable toolchain, keeping it running is a cost IT will have to continue to bear. Don’t just treat it like something you can check into when it fails; give it the same treatment that you use it for. Make certain you have backups—if possible, make your DevOps environment totally dynamic—so a failed instance of X can just be replaced by a new copy. You can’t “just spin up” complete and accurate datasets, but the backups fill that void.
It’s More Than Just DevOps
Stay on top of tool updates and improvements, just as you would for any mission-critical software. Because once your software factory is dependent upon the DevOps toolchain, it is mission-critical software. Learn how changes will impact the environment you are using the tools in.
And it’s more than just the tools themselves. The DevOps environment is highly complex with a lot of variables that can have a negative impact. The move from RHEL/CentOS 6.X to 7.X is a good one. Sure, it’s better by objective standards, but there is enough change that it breaks a lot of things. Mostly the change in networking, but libraries and tools also. In fact, I helped someone work through inter-process communications (IPC) issues caused by the upgrade. This is a stumbling block for the DevOps team in a RHEL/CentOS environment, even though the DevOps team may not be the one choosing to upgrade.
It’s All Good, Though
So there is a cost. The cost is not small. But the benefits still far outweigh the costs, and I’m not arguing otherwise. What I am arguing is that it needs to be a cost that is accounted and planned for. “When it breaks, we’ll fix it” is too relaxed for the toolchain that will run your software factory. Once it’s right, take the time each week to keep it right. Set up the policies, processes, and time to keep it clipping along smoothly.
And keep kicking rear. This is just a cost of doing business, much the same as database maintenance. Make sure you’ve got the time set aside, so your other projects can keep moving the org forward, and all is good.