‘DevOoops’ Moves: Unforeseen Lock-Outs

DevOps’ ultimate aim is to create efficiencies in the software delivery process. But it doesn’t always work out that way. Sometimes DevOps initiatives lead to mistakes—something we lovingly call “DevOoops.” CloudBees recently polled some colleagues and came up with five examples of a DevOoops in action. Following is a discussion submitted by Carlos Sanchez, principal software engineer, about unforeseen lock-outs.

Scenario:

When you’re automating deployments using configuration as code, make sure you have the right protections in place (i.e., validation of configuration). Otherwise, it’s easy to lock yourself out, forcing yourself to manually log in to each machine to fix it when a bad change is pushed and deployed to all the machines.

Having a strong technology solution that securely and centrally stores credentials for the names of people who access those machines is critical to maintaining a successful DevOps program. It’s a no-brainer. When you have to service accounts, you’re much more likely to be able to leverage a fast-acting script to log in and reconcile logouts if credentials are centrally stored than if you had eight individually managed service account-based credentials.

How do you reconcile lock-outs? How do you avoid the kind of DevOoops situation that can derail all of the good progress you’re making? The process you adopt will depend on what your software development and deployment life cycles look like. Here are several potential approaches:

You can use multiple staging and preproduction environments that share common configurations with the production environment. This sets up a process where, before you try to deploy an app or a new configuration to your production environment, you know you have successfully deployed it to environments that are as close as possible to production, multiple times.
You can have post-deployment validation scripts run after every deployment and have them validate things such as, Can account X log into environment Y? The scripts should include common or leading indicators of success or failure of a deployment. Once artifacts have been deployed and the system starts, every deployment will immediately run these scripts as part of the deployment process. So, every part of the deployment is automatically validated as success or failure.

When taking an approach such as rolling deployments, you also need to ensure that broken configurations are identified before they’re propagated across multiple servers or server clusters. This means you need to validate your deployment and your configuration changes before you go to production, validate deployment and configuration changes on first deployment and then validate, validate, validate at each subsequent step.

By doing all of this, you avoid surprises. You extend a quality-first, continuous validation approach beyond just your application code and apply that to your environment. Avoiding unnecessary lock-outs will help you mitigate risk, eliminate wasted time and avoid bad customer experiences.

Takeaways

As you bake quality into the processes and shift automation left, you need to make sure you validate all changes early and often.
Organizations need to implement proper processes, not only for application code changes but also for infrastructure as code changes that manage environments.
In automating in pursuit of DevOps, it’s necessary to have a strategy for credentials and secrets management of permissions for your DevOps and production systems.

— Brian Dawson