Configuration-as-code, or “GitOps” as many are now calling it, is a simple but astounding idea: Standardize configuration, check it into version control; no need for each Operations/DevOps person to develop their own set of scripts to get the job done. Configuring or reconfiguring an environment is as simple as checkout and run, assuming everything in the configuration is tested. For a project or a team, this is a serious time-saver and limits the amount of human error introduced into our systems. For projects rolling out code frequently, it is pretty much required.
But like everything, it has its limits. I worked with a product management engineer once who had concocted a script that had tens of thousands of lines of code and was every bit as complex as the product he used it to manage. His scenario was the typical precursor to configuration-as-code: He would go show off the product and whatever functionality he needed could be managed through his script. The problem was maintenance; as the script was central to how he did his job, changes to it were a big deal. And even though he was the only one maintaining it, he did get burned a couple of times by changes he’d made, because it was that complex.
When our configuration-as-code starts to cross teams, or we take it to the enterprise level, we risk finding its limitations. Our environments are complex, very complex. And we can do a lot with configuration-as-code, but throwing everything plus the kitchen sink into that system will create issues we need to overcome.
A single project is constrained. We know what we need to do to spin up the environment. We can identify weak points and build checks and balances around them.
The enterprise environment is not a project. Agile projects and DevOps thrive because they have a stable working environment to run on. The volume of change in things such as storage allocation, routing and VLANs is not the volume of change that an agile project sees. Good documentation is as solid a solution as configuration as code for something that changes rarely and the size/number of scripts and configurations that need to be maintained is smaller.
There is a good use case for corporatewide configuration-as-code for recovery purposes. It is far easier to restore the environment in a disaster if you have all configuration and the scripts to install that configuration available. Unfortunately, that use case is minimized by the march of technological change. During the one total destruction restoration I’ve been involved in (was in charge of), many of our configurations were no longer usable because the products we were running in production at the time of the disaster were not the products we could purchase for the rebuild. We ended up with a shiny new data center from building to servers, but it still involved a lot of manual configuration. So while the use case is good, it is not a panacea unless you are constantly replacing your infrastructure with the newest products, limiting the usefulness of this use case also.
Some organizations have already moved to configuration-as-code via methodologies such as chaos engineering, but most of us are just starting to consider moving configuration-as-code beyond the project level. When the (admittedly large) step of truly moving infrastructure management toward configuration-as-code, start with inventorying what your operations team does on a daily basis. There are a ton of things—the famous low-hanging fruit—that can be boiled down to scripts or standardized configurations and checked into version control. This gives a central point of truth for how things are configured, tracks changes for when things go wrong, and shortens startup time for new DevOps/Operations staff by giving them a repository of tools to review and work with. When working in the cloud, everything is scriptable, so make that side of operations is completely configuration-as-code. In the data center, you end up with diminishing returns after daily tasks are automated, but assigning someone to codify all possible changes is a worthy goal, as long as the work is prioritized to make the most frequent or error-prone configuration issues a priority. Once the focus is down to hardware, the rate of return for those invested hours is going to go down a bit, since specialized hardware is full of proprietary or market-focused config info. To draw on a comparison I’m familiar with: An F5 ADC has very specific use cases that must be reconfigured centrally, whereas restoring your Kubernetes cluster will generally require less focused configuration and teams spinning up containers will take it from there. Not that Kubernetes configuration is simple; but each project handles the details they require while corporate-shared resources tend to need all of the details handled by the infrastructure team.
We can discuss this at length, and I have with others who think differently, but small tools are better. Do one thing, and do it well—the original Linux mantra—then make scripts that call the small ones. This contains changes over time and doesn’t create dependencies or touches on things that might not need it in a given situation.
And keep rocking it. All configuration-as-code does is reduce manhours and opportunity for human error, so embrace it, but be controlled about it. Don’t throw everything in there and then forget about it. Make a plan, make a maintenance plan. Iterate. Improve. And use that extra time to solve user problems, which is why we’re here in the first place.