These days, there are a lot of different DevOps tools to accomplish a lot of different jobs. Almost daily another startup comes out with a new and innovative product or a newer (maybe even better) version of existing tools. One of the biggest innovations has been infrastructure-as-code (IaC). Giving infrastructure admins and developers alike the ability to create, manipulate and destroy infrastructure using code files changes the way we administer our environments. Though these tools are revolutionary and can solve many problems, sometimes they come with a whole new set of issues—as can any type of tooling, really.
Generally, IaC tools are run locally at first. When an infrastructure admin or developer is just starting out with it, they store the files locally; maybe in a repository, then clone the files to execute locally. This is perfectly acceptable and a very common practice. The issues arise when it comes to scaling this practice to a team; for example, with regard to visibility and governance. If everyone is running their deployments locally, the team doesn’t have clear visibility into who is deploying what, when and with what variables. This is where IaC automation tools come into play.
There are many different types of tools to solve these problems. Some are better than others. CI/CD pipeline tools, custom-built automation platforms, purpose-built IaC automation platforms—each has their pros and cons. Getting into all of that can be overwhelming and we could go on for hours about each option. But, we’re here to talk a bit more generically today. So, we’ll save all of that for another time. For now, let’s take a step back and talk about the top five must-haves for any infrastructure-as-code automation platform you’re considering. Because each company and use case is different, we’ll talk about these considerations in no particular order.
Role-Based Access Control (RBAC)
This first factor is pretty important. I know I said I wasn’t discussing these in any real order, but this one should really be close to the top of the list for anyone looking into IaC automation. We talked about how automating IaC can cause visibility issues when you scale across a team. You may want to know who deployed what, when, where and why, for example. Usually, the reason you want that kind of visibility is that you would like to control all of these aspects. That way you can help limit waste, budget problems, security concerns and the like. You may want to provide self-service deployment access, but only within certain parameters; maybe only to certain people or teams. Having granular RBAC can help you design a security policy that makes the most sense for your organization and its needs. Some tools have this baked in, some take a bit of work to make it work. The point is that you should have some form of control over your deployment process.
Following the security thread: A lot of these IaC automation tools are SaaS platforms. SaaS can be very secure. But sometimes, for compliance or regulatory reasons, you may want to retain a bit more control over your deployments. This is where self-hosted agents or runners come in. This type of design allows you to keep your “secrets”—such as your cloud credentials and other sensitive variable information—secure in your own way. You can use whatever secrets management solution you want, be it something from your cloud provider or another third-party solution. It also allows you to keep your code secure. All of these tools, when deploying your code, have to get a copy of that code to deploy. So, they’ll do a ‘Git clone’ or some kind of file copy process to get your files from where they are stored, to execute them. If this is a SaaS solution, then they will essentially have access to your code. Again, this may not be a huge issue for some, but it very much is for others. If you don’t want to give a third party this level of access to your code and secrets, a self-hosted agent or runner can help keep it all secure and under your control.
Plan on Pull Request
This feature is a big one when it comes to workflow. Some may not even know how important plan on pull request (PR) is until they realize how easy it can make their life. This feature may be called something different depending on the platform. Plan on PR, PR plan, speculative plan, etc. Essentially, this means that whenever you open a pull request against a branch, your automation platform will complete a deployment just up to the “plan” phase so that you can see exactly what this code change will do. For example, if you accidentally add an extra 0 to a variable or code file, you may end up with 100 instances instead of 10. Ideally, you’d like to know what was going to happen with a change before it is deployed, giving you the option to quickly fix a potential mistake before it actually goes through. In DevOps, you always want to ‘fail faster’ and close the loop so that big mistakes are caught before they happen. That is why this feature is so important. And if the automation platform you choose is doing this correctly, when it runs this plan phase it’ll automatically update the pull request directly with the possible changes as a comment. This way, your developers can see what is going to happen before they try to merge the deployment back to the main branch for deployment to production. And if they take it a step further by adding service checks to the pull request, you can even configure it so that if the deployment plan fails for any reason, developers are actually blocked entirely from merging the PR back until it is resolved. All of these features help to enable the ability to work in a GitOps methodology. Don’t get me started on the wildl misuse of the term GitOps—it is very specific. While one of the central pillars of GitOps is using your repository as your “single source of truth” for what should be deployed, it’s not just using Git to store your Terraform files; it is an infrastructure administration via pull request methodology. If you want more information about GitOps, you can read more here.
This feature is essential for most folks. But, depending on the workflow, it can be less important. Even if it isn’t a must-have now, it may be down the road when you evolve and grow into new workflows. Continuous deployment is the ability for your automation platform to have a webhook or other connection to your source code repository so that it can automatically trigger a deployment process whenever you commit or push to the repository. This feature goes hand-in-hand with the plan pull request feature we just spoke about. To make that feature function properly, a CD feature would be needed, as well. Now, I want to make sure I clarify. Just because you have continuous deployment enabled doesn’t mean it is just going to push to production in the middle of the day every time a commit or push is done. You should still be given the ability to pause the deployment process after the plan phase. This way you can validate that the plan looks acceptable before you approve the deployment. For some use cases, like developer sandboxes, you may “auto-approve” the deployments because you don’t care about having to get validation first. It all depends on your workflow and processes. A good platform will give you the ability to control all of this.
Shift Left Extensibility
This one is a bit of a wild card. It can mean a lot of different things depending on the situation. Essentially, what I am talking about here is the ability to integrate—or at least be interoperable—with a multitude of other tools during the deployment process. There is a concept called continuous verification which means to shift some tools and processes left into the deployment process so that you can stay on top of problems before they start. In the PR plan feature section earlier, we talked about catching issues before deploying. This isn’t just for resource configuration issues. What about security issues? Compliance issues? Performance issues? Or even budget issues? If your IaC automation tool gives you the ability to integrate the processes and tools you are using to validate these things into the deployment process, they no longer have to be an afterthought. You never have to worry about going over budget if you are constantly validating the budget each time you deploy. You don’t have to worry about a security issue if you’re validating security during every deployment. This type of functionality can look and be named very differently between tools. But as long as you have the ability to get all of these other groups (security, finance, etc.) involved sooner, everyone will be better off. Like we talked about before, in DevOps, it’s always better to fail faster. And integrating these checks into your deployments can help you achieve this.
We’ve made it through all five! We’ve talked about some basic security features. We talked about some automation workflow must-haves to make your life easier. And we even scratched the surface on some next-step-type things to look into such as continuous verification. That was a lot—hopefully this has been helpful for anyone looking into IaC automation platforms and you now have an idea what to look for. Some features may be named differently, but the ideas behind them are the same. If your platform can’t do these, it may not be the right one for you. Or, maybe it is and that feature just isn’t that important to you. As I mentioned, your mileage may vary. Every organization is different.
To hear more about cloud-native topics, join the Cloud Native Computing Foundation and the cloud-native community at KubeCon+CloudNativeCon North America 2021 – October 11-15, 2021