Deployment and Monitoring Automation with glu

glu is a free/open source deployment and monitoring automation platform. glu is a project that was started at LinkedIn mid-2009 to address the exponentially growing needs of deploying the set of applications and services that make up the LinkedIn experience. Although some projects like Chef and Puppet existed at the time, they were mostly good at configuring the infrastructure (creating users, installing java, etc…). glu lives in a higher space: provisioning dynamic applications on an ensemble of machines (change often, real-time failure detection, etc…).

glu is in the application deployment space

First, let’s start with some definition: when I say application, I mean a piece of code that usually needs to run, and usually offers some form of api to talk to (which, at a lower level, is a socket listening on a port). Equivalent terminology is service or server. For example, a webapp server is an application. Although glu can also deal with one-off programs (run, do some computation and then stop) or infrastructure (perl, java, etc…), it is not optimized to handle these use cases and other projects are better at this task (Chef/Puppet).

What is particular about applications in general, is that they tend to change a lot more often than infrastructure. In the age of continuous deployment, it is not uncommon sometimes to have the same application updated and deployed several times a day!

glu has been designed to handle this use case: deploy applications to an arbitrary large set of nodes:

efficiently
with minimum/no human interaction
securely
in a reproducible manner

glu executes a deployment — simply click “OK”!

glu is in the application monitoring space

Once an application is up and running, many failures can happen from the low level hardware crashing to the application itself misbehaving. This is the nature of it, nothing is perfect, things will crash one way or another. So it is very important to monitor the state of the system in general in order to be able to react appropriately. glu has been designed to give a full view of the set of nodes on which you have deployed applications. When something goes wrong, you can know in a near real-time basis. You can then decide on the proper course of action like restarting the failed application on the node it crashed, or downgrading it to the previous version and redeploying the previous version to all the appropriate nodes. As a side note, you can also build a full monitoring solution on top of glu as explained in this blog post.

glu is a platform

It is very hard to build a one-size-fit-all solution. This is why glu has been designed to be a platform in order for you to build the solution that fits your environment and workflow on top of glu:

glu offers a REST api so that you can control glu yourself via a simple and documented api. For example, glu is not auto-reactive by design (which means that if something fails, glu will tell you about the failure, but it will not act on its own about it), but using the REST api, it is fairly easy to build a reactive solution on top of it.
the console (which is the main UI / REST api), is highly customizable (plugins to change the behavior of some features, css tweaks to let you change the look of it, etc…)
glu comes with its own package/distribution builder which lets you customize every component to suit your own needs (as simple as changing the port numbers or as complex as providing your own specializations).
glu does not dictate what a deployment mean to you, nor how to model your own system (clusters, groups, etc…), nor how to model your workflow (aka state machine in glu’s terminology). You are in control and you decide.

glu in practice

glu is not an academic exercise. glu has been built and successfully deployed at LinkedIn early 2010 prior to being released as open source. glu helps LinkedIn manage the complexity of releasing hundreds of applications/services on a 1000+ node environment (numbers from 2010). Other big names like Orbitz and Outbrain are using glu for their deployment needs.

Over the years, glu has been refined and enhanced in order to support more use cases but the core and fundamental concepts on which glu was based originally and that I briefly introduced in this post are still what makes glu what it is!

How to start with glu?

If you want to take a look at glu, the best is to head to the main documentation page. I would also recommend to download glu and try the tutorial which gives you a feel of what glu is capable.