This was live blogged by @martinjlogan so please forgive any errors and typos.
The way we do things in production is not always the right way to do things. Coming here to a conference like Camp DevOps and listening to folks like Jez Humble is kind of like coming to Church and reupping your faith in what’s right!
Handcrafted; great for a lot of things. Furniture, clothes, and shoes. The imperfections give a thing character. Handcrafted however has no place in the datacenter. Services are like appliances. Imagine that you run a laundromat. Would you rather have a dozen different machines that all need to be repaired in different ways by different people or would you rather have one industrial strength uniform design for each unit?
In Groupon’s infancy in order to get started quickly we outsourced all operations. We have gotten to the scale though where the expertise of those we outsourced to is not sufficient for our current needs. As a result we have brought it in house now. Given this we needed a way to manage our infrastructure efficiently and with minimal errors under constant change.
In Sept 2010 we had about 100 servers in one datacenter. Many of them were handcrafted. That was ok though, because someone else worked on them. Today we have over 1000 servers in 6 locations. As the service has grown we have felt the pain of a shakey foundation under our platform. That is the driver behind developing this project – Roller. Roller really embodies the DevOps mindset.
The DevOps mindset is typified by folks that love developing software and that are also interested in linux kernels and such – and vice versa. I am one such person. At Groupon I do work for many different areas. I started my career at Yahoo in 1999. I also co-founded a company called Dippidy. I left there and worked for Symantec. Each time I have worn a different hat. Enough about me and my stuff though – lets dig into Roller.
I won’t be giving a philosophical talk but instead will get you into the nuts and bolts of roller [I will summarize this in this live blog - see the slides for more details]. If you have any preconceived notions about how host config and management should be done please try to forget them as this project is quite different. This project is on track to be open sourced from Groupon sometime in the first half of next year.
So, what does Roller do? It installs software. Really, what is a server, it is a computer that has some software on it. Roller installs this software. It facilitates versioning your servers in a super clean way. It allows for perfect consistency across your data center. This handles basic system utils like strace to application deployment. They are all the same, just files on a disk.
You are probably asking yourself why is this guy up here reinventing the wheel. Why do this? We already have Chef and Puppet why bother. Well, we wanted this to be very lightweight. Some existing solutions require message queues, relational DBs, and strange languages that are not already on the system. We also needed to deal with platform specific differences. We have 4 different varieties of Linux. The big thing though, is we wanted a system that was dead simple and audit-able. A lot of the systems now give you tons of power. Inheritance heirarchys like webserver -> apache webserver -> some config of that server etc… That looks great from a programmer brain perspective, but in production this complexity can cause unwanted side effects and cause problems. We wanted to build a system that was blocked on a source code repo commit. We wanted any change in the system to go through git or some other VCS system.
There are 4 parts to roller.
1. The configuration repository.
2. Config server
4. “roll” program.
The configuration repository is a collection of yaml files. The config server sits in each data center. This is a web server that does not have any db. It is a ruby on rails app with no database. It provides views of data stored in the config repository. Packages are precompiled chunks of software. For instance we have an apache package, or some other appliction. A package is just a tarball. The packages are stored and distributed from the config server. Config servers in different datacenters use S3 to distribute packages. We put a package on one config server and then it is world wide in about a minute. Finally we have “roll”. This is what we execute on a host, a blank machine perhaps, to turn it in to a specific appliance.
This contains simple files that have within them information about datacenters. This also contains host classes – basically configurations of particular host types. These host classes are just like defining a class in Ruby or some other language that supports classes. The config repo is basically a tree of a fixed depth of 2.5 levels and no deeper. The leaf nodes are the host files. These are contained in the host directory within the config repo. This defines configuration at the host level. Host classes have names and versions for a particular host. The hostclass does not contain a version.
We have spoken alot about these yaml files. These are the world for roller. Now to make use of them we need the config server. The config server is a rails app that gives us views of the config repo data. We get to see the yaml config, we can see which hosts are using a particular big of config, we can diff configs to see what changed. A nice thing about this is that you can just run curl commands to investigate the system.
I can also use curl to investigate host classes. Config server just pulls these things out from a git repo and sends them back. This creates a nice http bridge into our running system. This has a lot of value. We will see this with roll.
Groupon Roller’s Roll server executes code on a host. It runs the http fetches, just like you would do with curl, fetching the host and host class yaml files from the config server. It then downloads any packages that it does not have. It then prepares a new /usr/local dir candidate. It generates configs. It stops services. Moves the new /usr/local into place, then starts the services. Basically each time it nukes the host starting from a new base state. Roller owns user local essentially. This is kind of a nuclear solution. We are not quite re-imaging the whole host but it is still fairly brutal.
This whole cycle typically takes 10 to 30 seconds. The actual services are down for just a few seconds normally. Things are actually only down for a short period of time.
This is a roller package. It is in every hostclass. It adds a cron entry when installed on a host. Every x minutes it trues up its users with a config repository via a config host. This is how we do basic user management. You can use this to manage your own profiles and user directories to get your .profile or emacs config or whatever you want on all the hosts in which you have access to.
Wrapping it up
on twitter at @thenobot
steinkamp.us/campdevops_notes.pdf is where you can get notes on the presentation.