As much as we talk about how we should have shared goals spanning Dev and Ops, it’s not nearly as easy as it sounds. To fuel a DevOps culture, we have to build robust tooling, also. That means investing up front in five key areas: abstraction, composability, automation, orchestration, and idempotency. Together, these concepts allow sharing work at every level of the pipeline. Unfortunately, it’s tempting to optimize work at one level and miss the true system bottlenecks.
Creating production-like fidelity for developers is essential: We need it for scale, security and upgrades. It’s not just about sharing effort; it’s about empathy and collaboration.
But even with growing acceptance of DevOps as a cultural movement, I believe deployment disparities are a big unsolved problem. When developers have vastly different working environments from operators, it creates a “fidelity gap” that makes it difficult for the teams to collaborate.
Before we talk about the costs and solutions, let me first share a story from back when I was a bright-eyed OpenStack enthusiast:
I was at the fourth OpenStack Summit in Boston. My team at Dell had created open deployment automation, Crowbar, that could turn raw servers and install a fully working OpenStack cluster in about 30 minutes. I don’t want to oversell it; there were plenty of rough edges and preconditions. However, the automation worked and OpenStack came up like a kudzu seedling. In fact, we’d actually brought a half-rack of servers to the summit for live production demos.
On my way back up to the demo rack, I was riding an elevator with one of the founding developers of the project. He was positively glowing over a new project that he’d created called “Devstack” and wanted to show off the cool logo they’d designed. He gushed, “It’s awesome! It runs all of OpenStack in a single python environment on my laptop! You don’t even need Vagrant anymore—it’s a super fast and easy for developers to get OpenStack running because all the configuration is in Python.” I watched as he typed a command, text scrolled and welcomed me to the OpenStack API. “See, now I’m running OpenStack!”
At the time, it seemed like a harmless toy, since a single-node cloud deployment is useless to users, testers and operators. I pointed out that we’d gotten our multinode automation working in virtual machines (VMs) now so it would be easy to use even without dedicated hardware. He asked what we’d written it in. “Mainly Chef with some Rails UI,” I responded, thinking we’d made some solid 2011 DevOps choices. He was less enthusiastic, because we were not using the developer-preferred Python. Even so, I encouraged him to come by the rack for a demo, but without any anticipation.
We left the elevator on a friendly wave of OpenStack synergy. He sprinted into the developer lounge and I trudged back to the user and vendor showcase. Even without words to describe it, I saw that the project had started diverging between developers and operators. Developer collaboration drove a need for something fast to set up that ran on a single laptop. More importantly, Devstack stripped away all that messy multinode networking and DevOps tooling that made it hard to configure and run OpenStack. In what turned out to be a one-way journey, there was now a universal developer environment.
Unfortunately, that developer environment configuration did not translate into production.
In fact, OpenStack deployments are notoriously challenging and there is an overabundance of divergent approaches. Even with so many efforts, few have managed basic operational milestones such as upgrades and high-availability configurations. I believe a wide fidelity gap is a key, if sad, part of this story.
We don’t have to live with fidelity gaps!
In fact, DevOps exists to close those gaps by bringing developers and operators to a shared space. You are already well-versed in the need to treat development as a pipeline with automation to keep it repeatable.
If we know the challenge, why do we allow these gaps? It’s because fidelity can have too high a cost in terms of complexity, system overhead and time. These costs multiply the faster we try to iterate. That creates a conflict between the organizational benefits of agility and fidelity; sadly, few teams have been willing to burden developers with fidelity or operators with change frequency.
How Can We Make Fidelity Cost-Effective in Initial and Ongoing Costs?
There are five components to consider: abstraction, composability, automation, orchestration and idempotency/repeatability.
Platform abstraction hides the differences between environments and tools that make developer and production environments different. For true fidelity, we use “platform” in a very broad way. A cloud API or server is a platform, but so is an operating system or container system. In practice, there are many possible and acceptable abstraction layers.
Teams need to invest in ways that allow the same automation to work on multiple platforms in the same way. That requires modifying the automation to be resilient in different environments and accept configurations instead of making assumptions. These changes are good hygiene, anyway, and make your automation more robust and portable.
The simplest way to add abstraction is externalize environmental configuration. Since it’s difficult for operational systems to detect the broader system configuration, we need to inject that information into your deployments. Unfortunately, that leads to the “massive pre-deploy inventory file” antipattern and our next component.
Composability goes beyond making automation modular. It means that we can flow information, à la functional programming, through the discrete parts of the install.
In a good composable design, each unit has defined inputs, outputs, preconditions and behaviors. That allows us to chain them together to accomplish work. Our systems should be able to connect outputs from earlier modules to inputs from later ones. This eliminates the “pre-configuration file of doom” because we can build up system knowledge as the work is done. Since the functions are well-known, we can determine if we’ve completed all the needed setup.
Substitution is even more important than having a chained configuration flow. Since modules are isolated actions and known parameters, we can create alternate modules based on user- or infrastructure-driven requirements. For example, choosing CentOS vs. Ubuntu changes package management; however, downstream applications only care that their packages are installed. We need to make it acceptable to detect and adjust modules based on situational awareness without breaking up our downstream components.
This type of change tolerance is essential to fidelity.
You may assume this is obvious; however, we all have a tendency to overlook manual steps or setup in our daily work. Common statements such as, “It takes a day to build my dev environment,” or, “We always run this pre-config script the systems are up,” indicate that we are not as automated as we want to admit.
Fidelity gaps open when teams have different processes.
We don’t expect developers to use hundreds of physical servers; however, we can expect them to isolate components in ways that are similar to operations. That adds overhead to their work that requires automation. We don’t expect operators to reset and redeploy systems 10 times a day; however, we can expect them to use repeatable configurations that do not require manual intervention.
Remember: It does not have to be the same automation in all cases! Composability allows us to have equivalent but different paths to the same state.
There is a downside to composability. If you’ve replaced your fragile monolithic automation with robust composable modules, then you’re going to have to add some complexity to chain them together. It an unavoidable tradeoff because the composite system needs glue to hold together. That glue is orchestration.
Truthfully, adding orchestration is not a downside. It is a necessity for scale. Any multisystem configuration process requires orchestration to be sustained. The nature of scale applications is that we have to coordinate activities horizontally across the system during deployment, scaling and upgrade. That allows us to create trust relationships between nodes, set up networks before trying to use them and coordinate servers and clients while we upgrade APIs. Wait loops and iterative convergence are difficult to troubleshoot and more difficult to predict.
There is a concrete benefit when you combine composition and orchestration: Small work units create a system that is very transparent and parallelization in operation. Those are key values in scale ops.
Idempotency / Repeatability
The last component is one of the most elusive: idempotency. The objective is to allow multiple runs of modules in our automation system to leave the system in a consistent state. That allows us to repeat execution of a component without harming or undoing work that we’ve already done.
The concept is not simply skipping work we think we completed! If we rerun an action, then the action should exit in the requested state every time. If the state is already good, then do nothing. If the state is not configured, then create it. If the state is not good, then fix it. Basically, we want to be confident that we can achieve our end goal every time but never make unneeded changes.
This is not a simple task! The nature of configuration is to change systems, and we usually assume that we are going from unconfigured or misconfigured to correctly configured. For idempotent scripts, we have to guard against going from correctly configured to broken.
This may seem like a lot of work for developers who usually start from clean systems, but it’s life-saving for operators when a timing error causes a unit to fail.
Hey, You’re Adding a Lot of Work to my Deployments!
Following these five principles does add some overhead; however, all of the changes are good hygiene anyway. They make your automation more robust and portable. More importantly, they reduce friction between your development, test and ops processes.
Deployment fidelity is a real benefit that helps drive success. Hopefully, these five principles have made it a more concrete and obtainable goal.