One of the things that many of us have been observing is that in the midst of a sea of other changes, data center architectures also are changing. By a lot.
RAID array sales are down, even though unstructured data is still going up; networking is getting an application programming interface (API) makeover in some places and pure software-defined networking (SDN) in others; and the topic of this post—the concept of “server” in the data center—is becoming somewhat fluid.
Until recently, “server” meant either physical hardware or a virtual machine (VM) on VMware. There were other definitions, but no one cared about them. Then came OpenStack, and while the same as VMware in terms of meeting enterprise needs, it brought with it more work and more options than VMware, and started people thinking more about servers and what that word means. Now we have Apache Mesos, and its various incarnations are gaining traction in the data center. This doesn’t even touch on specialized clusters such as those Hadoop requires.
If you’re busting your tail every day keeping the systems humming and serving the needs of the organization, the differences might be elusive. I’ll take this blog to talk about the differences, and moving forward will focus more on Mesos and its variants because it is an area of DevOps that I think it underserved in the publishing arena.
While VMWare and OpenStack let you pool servers and then carve them up into virtual servers, Mesos decided to eliminate that “carve up” portion and treat the pool of servers as one big system that can service applications. That is a simplification, but when it comes down to it, think of Mesos as Linux with the resources of many physical systems instead of just one.
Because making a pool of servers into a single logical server means the server can do a lot more than a normal system, there are a lot of add-ons, bells and whistles. So for process isolation, containers are used to offer a bit of protection from rogues, for example. And since the Linux scheduler was not intended to handle cross-system processes, scheduling is handled by a system that, while not more complex, does require you to know a bit more about, so it seems more complex than just running it and letting the system take care of the rest. When you consider the volume of jobs a cluster of systems can handle compared to a single system, and the work of distributing jobs/apps/programs/whatever-you-call-them across the supporting hardware, it just makes sense that a scheduling and job submission management subsystem is required.
Because the Mesos scheduler (and more) has an API, there are some definitely interesting DevOps possibilities here: Run all the processes you need and have the ability to hook the scheduler and reporting into your DevOps tool sets? Definitely interesting possibilities that we’ll explore at some point in the future.
Why would you go with a clustered operating system instead of a cloud platform? That really is a valid question, and one the market is still working out. Early adopters like the fact that an entire layer of abstraction is removed as unnecessary. In VMware or cloud, you have physical hardware -> pool of resources -> virtual Servers -> application.
Both physical and virtual servers are running copies of an operating system (OS). That’s a lot of overhead.
With Mesos, it looks much the same, except that instead of VMs, the app is running on the pool of resources (or more correctly, the app is running in a lightweight container on the pool of resources). So there is less to maintain, less CPU/disk/memory overhead, and less install/configuration time.
Of course, any highly complex system has issues of its own. I am still learning what the real issues that crop up and enterprises have to deal with, so I will promise to report back after I have a handle based upon reality, but they are there—I just need to get a solid enough understanding that I’m not misleading anyone.
From a DevOps perspective, the ability to assume the configuration of the server the cluster presents itself as, no matter the app being submitted to run, and the management of an entire data center’s worth of applications on a common platform are strengths I intend to dig into.
If you’ve evaluated or used Mesos (or DC/OS or Mantl or SuperGiant or any of the other tools based on it), don’t hesitate to reach out to me. My experiences are one thing; others might have different issues/concerns/loves.
We do indeed live in interesting times, with changes in IT coming at a pretty astounding rate. Fun times. I’m actually pretty excited to learn how well Mesos serves the need and enables DevOps. It’s not a simple topic to discuss, there are a lot of moving parts, but we’ll get it covered, and help you figure out if it’s the best solution for your organization.