Editor’s Note: This post was updated Aug. 4, 2018, to reflect an updated slide presentation at the end of the article.
Infrastructure as Code is a truly powerful concept whose significance is best described with the words of Adam Jacob, CTO at Chef, in Web Operations: Keeping the Data On Time: “Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal resources.” And so it’s not for nothing that treating infrastructure as code is regarded a key principle for reducing cycle time and maximizing flow in Continuous Delivery and DevOps, respectively. Even if your plans are less bold as you read this article, wouldn’t you agree that consequent validation of what you automate on the infrastructure level — long before it goes into production — would be extremely powerful?
So, bear with me for a while and please don’t rush head over heels into automation by simply picking one of the established infrastructure automation tools such as Chef or Puppet, getting everyone trained in Ruby, and blindly start to “automate all the things”. If that’s how you do it, then you are most probably doing it wrong. In You’re Doing DevOps Wrong. Automation in the Enterprise., author Alan Sharp-Paul suggests that, before doing any automation, you first have to understand precisely the requirements and how to validate them to gauge success and maintain quality of your automations over time:
- Are the configuration files in place? Do they contain the right settings?
- Is this port open? Is that port closed?
Why Agile Infrastructure?
Honestly, you and I wouldn’t expect anyone to come up with reliable software merely by choosing an arbitrary programming language and a nifty IDE without having some sort of engineering process in place, right? Likewise, the (automated) provisioning of a solid production environment demands a no less methodical approach. Luckily, Infrastructure as Code enables us to treat the provisioning of infrastructure as well as automating deployments and configuration as an Agile engineering discipline. Although the concepts of Agile Infrastructure are not entirely new, testing against live infrastructure has only recently been facilitated with the consolidation of tools such as Ansible, Chef, Puppet, Docker, Vagrant and Serverspec under the hood of Test Kitchen (see more on that below). Here are some of the key implications of applying Agile software development practices to the infrastructure level:
Alignment of Devs and Ops
Agile software development teams are driven by customer demand and aim to produce outcome in short, regular intervals. Enabling IT Operations and Infrastructure departments to work in small iterations and react to changes quickly allows them to align with and effectively become driven by software development. The outcome of these joint efforts will be both working and deployable software at the end of each iteration.
Fail Fast: Feedback Loops
Executing infrastructure tests as part of your Continuous Delivery build pipeline (before an application is deployed and long-running tests are executed) establishes another important feedback loop that enables you to “fail fast” and get the right people involved. The underlying principle is simple: by expressing your intentions twice — once in the code and once in a test — you’ll know you’ve caught a bug whenever these assumptions don’t match.
Regression Safety
Believe it or not: whether you are using Ansible, Chef, Puppet or any other — you’re running on code! Their respective issue lists are full of unresolved issues of various severities with new issues coming in regularly. Having a sufficient safety net of tests for your infrastructure in place allows you to quickly identify bugs as they creep in (from either side) and protects you against unwanted regressions. After all, a bug in your infrastructure can be very hard to track down and can be much more severe than a bug in your application.
Refactoring
Code refactoring becomes a breeze when you have tests in place that immediately alert you when you broke the desired state of your infrastructure (and all you wanted was to make your code simpler and more maintainable). This is invaluable when you want to recompose your existing infrastructure from 3rd-party components offered on public hubs such as Ansible Galaxy, Chef Supermarket or Puppet Forge, and especially when you are migrating away from your existing automation stack.
The Red, Green, Refactor Cycle of Test-Driven Development
Test-Driven Development (TDD) is a popular software development process that involves writing code in short, repeating development cycles with the goal to enforce good design and enhance confidence. While it is certainly no silver bullet and has been controversially discussed in the past, the Red, Green, Refactor Cycle, which is at the heart of TDD and whose principles I want to quickly summarize here, hopefully gives you some inspiration on how to incrementally build up your infrastructure:
- Red: Understand the requirements and the state you want to implement. Then think about how to capture this state in a test. Implement the test, then run your entire test suite and see the added test fail — and if it doesn’t, something is really wrong.
- Green: Implement just enough functionality to make the test pass. Don’t care much about readability, simplicity and design for now — just make it work. Run your tests again and watch all tests pass. If they don’t, either fix your test or your implementation, or take a step back (revert to a point in your code that’s known to work) and start over with a smaller increment.
- Refactor: Now that your tests pass, improve your implementation and make sure, by running your tests again, that you didn’t break anything. Once you are done refactoring, start the cycle over again.
Keep in mind that each repetition of the cycle refers to a small increment. In terms of infrastructure, such an increment could be the installation of a user, a package or the placement of a configuration setting. When you arrive in a spot where you broke something, you can easily revert to a point in your code that’s known to work. Leveraging such practices really makes the difference between Infrastructure Automation and Infrastructure as Code, as described in Infrastructure as Code – Automation Is Not Enough by Keith Morris.
Test Kitchen, Terraform, AWS and awspec
Now, how to get there? Test Kitchen is an easily extensible test harness that allows you to test your code written in Ansible, Chef, Puppet (and others) on various cloud providers, virtualization providers and operating systems with tests written in a variety of test frameworks.
In the following presentation, I give hands-on examples for you to quickly dive into the topic of using Test Kitchen with awspec as a framework to test Terraform deployments on AWS: