When building a house, work is done in teams. This work is managed by a foreman, supervisor, or general contractor, but let’s restrict ourselves to the teams doing finish work, and a foreman. Foreman is old fashioned, but general contractor gets long to type over and over. Assume a large enough house to warrant a couple of teams putting in cabinets and trim, hanging doors, etc. A team of two tells the foreman “We’ve finished with the master bedroom”, and he looks at his list and says “Good, go install the trim in the master bathroom.” And the two head off to do just that. About half way through, one of the installers has an unfortunate incident with a nail gun, and nails his hand to the floor. Both inform the foreman that they have to leave – one to go to the hospital, the other to drive.
When the other finishing team comes free, the foreman says “The master bathroom trim isn’t done, go take care of that for me.” And keeps his checklist up to date.
Eventually, the entire house has the finish work completed, simply by the foreman sending teams off to do different parts of the job until all parts are complete.
That’s Mesos, essentially.
The foreman is, in a nutshell, Mesos in this example.
Mesos is a scheduler that knows what jobs have to be done, and what resources are available to do them. The details of getting the job done are not Mesos’ concern, but if the job is not completed, Mesos will send other resources so that it can get done.
There is a lot more involved, of course, but the analogy holds pretty darned well.
Mesos turns a collection of machines into a cluster, simply by placing agents on each machine that know how to (a) report what resources are available for Mesos to dole out, and (b) execute applications for a given environment (called framework in Mesos-speak for a variety of good reasons). So once Mesos is distributed across your servers, it can effectively treat those servers as one master server, and itself as the scheduling algorithm.
It’s never simple. Because IT isn’t simple.
I have an adage for IT though: No good idea is complete until the real world imposes complexity on the simple.
In order to make Mesos fault tolerant, there can be multiple masters, and zookeeper can be used to make certain that if the current master fails, there is a replacement ready to pick up and keep things moving (processing).
In order to keep Mesos focused on scheduling, it does not have complex resource scheduling built in. Instead, it relies upon frameworks asking for resources, and then offering them what fits the requirements. So let’s say Hadoop asks for five CPUs. Mesos may say “Here are three”, then a few minutes later say “Two more!” and Hadoop could tell the executors on those servers to start up tasks. Frameworks can also tell Mesos what they don’t want (filtering), and reject offers of compute resources Mesos makes (in which case those resources are offered to any other frameworks out there).
Of course, those frameworks that know both how to schedule jobs and how to send them to agents are key. Marathon can handle most long running tasks, Hadoop has a framework of its own. So the “subcontractors” are ready to get the job done, for the most part.
So what is the benefit? Mesos doesn’t much care where you install it. Put it on hardware, put it on VMs, put it in a VPC. Using Marathon, you can deploy your apps in containers, meaning you really don’t have to care what the underlying infrastructure is.
What’s the downside? Well, there are a whole lot of tools to learn to make a production environment stable and maintainable, and there are other alternatives out there that we’ll be looking at in the coming weeks. Honestly, most of the “other alternatives” are expansions based upon Mesos (and often Marathon), so this was a good place to start in a series exposing what you need to know to consider if Mesos fits your environment. If you want to start running apps and seeing hardware/VMs as a pool of available resources, Mesos is a good place to start, though the packages we’ll talk about in the coming weeks are more interesting, mostly because they take Mesos’ simple scheduler concept and expand upon it to offer a more complete solution.
If you would like to learn more about Mesos, The Apache Foundation, DigitalOcean, and Mesosphere all have tutorials. I have used the Mesosphere one, and it was pretty good, though on OSX it failed, it worked like a charm on Windows 10.