Bootstrapping Chef (or Whatever) for Autoscaled EC2 Instances

I realize it is traditional to start writing a new blog with some background and a deep introspection as to the author’s personal motivation for writing said blog, but I’ve never been one for tradition. Thus, for my first official DevOps post, I think I’ll jump write in with a technical tutorial on a problem had to solve last summer that I haven’t seen well documented anywhere else.

You can read my bio for more, but here’s what you can expect on HackOps. I’m an industry analyst (in security; I’m the CEO of Securosis), but one with a bad habit of giving technical talks at DEFCON. In other words, a mix of research and analysis at both a technical and executive level.

The Problem

With that, let’s start with the technical:

Last summer I was putting together a demonstration for the Black Hat conference when I ran into a little roadblock. I wanted to launch instances and have them automatically connect to a Chef server and pull down a default set of (security) policies. The problem is there really isn’t a great, built-in way to do this, nor could I find one documented anywhere. Normally, your options to install Chef in an EC2 instance are:

* Load Chef and the configuration into a custom Amazon Machine Image. I wanted to use default AMIs instead of maintaining my own, so this wouldn’t work.
* Launch instances using the Knife command-line tool (how you manage Chef, for those of you who don’t know) and the EC2 option. But I want to autoscale my images, which means I can’t launch them from Knife, since Amazon Web Services will launch them for me.
* Bootstrap instances after they are up and running using the SSH bootstrap. But this is manual, or requires me to embed my SSH key in some automagic code, so this wasn’t an option.
* Manually install and configure Chef. Not very DevOpsy, is it?

Now I already had some decent experience configuring instances when they launch using cloud-init. cloud-init is a tool included in some operating systems, most notably Ubuntu and Amazon Linux, that allows you to inject a script when you launch an instance and have it run the commands once it launches, before you get to log into it. You simply paste your script into the User Data field.

Overview

Chef isn’t actually well-designed to handle this, since it seems to prefer a hands-on install, but I managed to hack together an effective process. At the high level, here’s how it works:

1. Place your chef configuration (client.rb), validation certificate (validation.pem), and initial first run Chef file (first_run.json) into an Amazon S3 bucket.
2. Use AWS IAM Roles to provide access to the S3 bucket. IAM Roles are worth a post all on their own, but the short version is you set the role for an object with Amazon and it is provided a set of rotating, temporary credentials to access other bits of Amazon.
3. Write a cloud-init script to download Chef and s3cmd, which is a command line tool for accessing Amazon S3.
4. In your script, set s3cmd to work with IAM Roles by installing a blank configuration file (figuring this part our was a major pain since it wasn’t documented at the time).
5. In your script, use s3cmd to pull down your Chef configuration files, and then install Chef with the configuration files we pulled from S3, one of which will run it for the first time and set the initial role, which Chef will use to push down the first policy set.
6. Launch an instance manually or as part of an autoscaling group. Set the IAM Role, and paste your cloud-init script into the *User Data* field.

Now I am assuming a few things here. First, that you know how to use EC2 and set the correct security groups for everything to talk to each other. Second, that you know how to create an autoscaling group (or manually launch an instance and set the User Data field). Third, that you have a Chef server or Hosted Chef set up already.

If you want more of a step-by-step you can read up on this in a paper I published — A Practical Example of Software Defined Security. It is ever so slightly out of date, but still works (just adjust it to get a current version of S3 tools and cut the *fix routing silliness* part of the script since that isn’t needed anymore and breaks things).

The Details

Here are the technical details for the files and commands:

First, for your IAM Role policy. Replace “cloudsec” with the name of whatever bucket you create to hold your client.rb, validator.pem, and first_run.json (I’ll give you an example one in a moment):

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:Get*", "s3:List*" ], "Resource": "arn:aws:s3:::cloudsec", "Resource": "arn:aws:s3:::cloudsec/*" } ]}

Next comes your cloud-init script. You can either past the entire script in, or use the command:

#include http://url-to-script

If you use a remote script, you probably want to store it someplace safe in S3 where no one can see it and see how you have things configured. Here is the cloud-init script itself, where the magic happens (our blog platform messes up the formatting a bit, but it is still readable):

#cloud-config

apt_update: true

#apt_upgrade: true

packages: - curl

configchef: - &configchef | echo "deb http://apt.opscode.com/ precise-0.10 main" | sudo tee /etc/apt/sources.list.d/opscode.list apt-get update curl http://apt.opscode.com/[email protected] | sudo apt-key add - echo "chef chef/chef_server_url string http://your-chef-server-URL:4000" | sudo debconf-set-selections && sudo apt-get install chef -y --force-yes wget http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.0-beta1/s3cmd-1.5.0-beta1.tar.gz tar xvfz s3cmd-1.5.0-beta1.tar.gz cd s3cmd-1.5.0-beta1/ cat >s3cfg <<EOM [default]access_key = secret_key = security_token = EOM ./s3cmd --config /s3cmd-1.5.0-beta1/s3cfg ls s3://cloudsec/ ./s3cmd --config /s3cmd-1.5.0-beta1/s3cfg --force get s3://cloudsec/client.rb /etc/chef/client.rb ./s3cmd --config /s3cmd-1.5.0-beta1/s3cfg --force get s3://cloudsec/validation.pem /etc/chef/validation.pem ./s3cmd --config /s3cmd-1.5.0-beta1/s3cfg --force get s3://cloudsec/first_run.json /etc/chef/first_run.json chef-client -j /etc/chef/first_run.json

runcmd: - [ sh, -c, *configchef]- touch /tmp/done

You’ll notice that when we launch Chef, we use the command line option to load first_run.json, which establishes out initial Chef role. This is what allows us to push an initial policy to the instance, otherwise we have to do more manual stuff (ick). I kept it simple and merely established a starter role for the instance, which corresponds to a basic run list. first_run.json looks like this:

{“run_list”:[“role[base]”]}

Once you get all this set up, as I mentioned earlier you only need to launch an instance with the right IAM Role to access the S3 bucket, then paste in the cloud-init script to the *User Data* field. This can all be set as part of your autoscaling group rules so it runs automatically as new instances launch.

The rest is magic. The instance takes a little longer to run, and it can take a bit for all the pieces to connect, but we are talking less than 30 minutes. Usually a *lot* less than 30 minutes.

Now while I used this as a security demo, I know this technique is used in product for at least one major web property you have all heard of for non-security configuration management. Not that I can take credit for it, I found out months later they use a really similar technique they figured out on their own.

The same basic techniques should also work with Puppet or the configuration management tool of your choice.

To be honest, I was pretty surprised I couldn’t find this process documented anywhere when I started my project. It seems like one of those basic things nearly anyone working in public cloud with a configuration management tool would need, since building your own AMIs all the time is really inefficient.

That’s one of the big reasons I’m excited to be writing here at DevOps.com, since I think we, as a community, need a good meeting place to share this kind of information.