As regular readers know, I’ve been digging into Mesos quite a bit these days and, for the most part, enjoying the entire ecosystem that has grown up around it. There are some very cool possibilities when organizing servers into one massive platform for application (or micro-service) delivery. While there is a bit of a learning curve, I wouldn’t say it is as bad as the application provisioning tools out there, and there is the ability to scale and manage apps while maintaining but a single reference architecture.
But every technology has its issues that we need to be honest and open about if they are to be successful in the enterprise. Mesos is no exception.
Most open-source projects start out with, “If we just did X, Y and Z, we could do this cool thing!” and only afterward consider the security needs of the target audience. This is kind of the nature of the beast, and Mesos is no exception. The Mesos team is currently on the path to enterprise-grade security from little (or optional) security, but for right now, here are things to consider to help make your Mesos installation secure.
According to Ben Hindman, Mesosphere co-founder and co-creator of Apache Mesos: “For folks that were running without authentication, we didn’t want to break their clusters when they upgraded. Authentication will be on by default after sufficient time for frameworks to get upgraded. We enable a mixed mode where some frameworks can authenticate and others don’t exactly to support this upgrade path.”
The thing is, the default upon installation of Mesos at this time is to have no—I repeat no—authentication for frameworks, and no limitation on what those frameworks can see. If you use command line switches, you can modify this default behavior, but the default is still to be wide open where frameworks are involved. The point of this blog is to urge you to use those command line switches if at all possible in your environment.
It is industry standard to do the exact opposite of what Mesos does—to offer an option to turn off security—but to have it turned on by default. And while it makes initial pilot deployments more difficult, that is precisely what Mesos needs (and, judging from Ben’s comment above, plans) to do.
The following items, taken together, create the core problem:
- By default, Mesos allows frameworks to register without authentication (see link labeled “command line switches” ,above). Normally, some form of authentication is required. Increasingly, OAuth access control is being utilized. See Stormpath or DigitalOcean for examples of API Authentication design.
- Mesos uses “roles” to determine which frameworks have access to what resources. By default, a framework can register for any role.
- Persistent disks (such as your database or your contracts directory) can be resources assigned in Mesos. Access is controlled by role only (last bullet point before version history), to make them available to multiple apps on multiple frameworks. This also makes access available to any framework that assumes the same role as those with access.
- Mesos can distribute agents to each node in the cluster as needed via the mesos fetcher. So distribution of agents, which would be a limiting factor to this weakness, is actually easy—a framework can request that its agents be downloaded to any server in the cluster, on demand.
- There is a public framework development guide.
So with a default Mesos installation, and any machine that can hit the REST API for frameworks, I can register a framework of my own construction. The Mesos system has automated approval of registration, so there is no human decision-making involved; registration (called scheduler subscription) implies approval if authorization is not enabled, and the response includes registration information. Considering that the frequency of adding frameworks should be minimal, this is a good place to have a human involved in the process. Note that older framework development using libmesos.so could also abuse this default, but that route is harder, and thus less likely to be utilized. My new framework can supply roles that exist in other frameworks, and I can then gain access to these static resources simply by waiting for Mesos to hand me the correct resources. These resources could include any persistent disk assigned to that role, with read/write access (because the current incarnation of Mesos only supports read/write). This came to my attention while working on Mesos with Marathon, but is possible simply with Mesos.
Again, this is a known issue and the Mesos team is slowly moving toward resolution. But for now, users need to lock down their own implementation if their environment will allow it.
The workaround is easy enough, there are command line options to turn on authentication for frameworks, and to limit which frameworks can use what roles. But you have to turn both on—they are off by default—and looking through the logs on my Mesos cluster, it does not appear that framework registration is logged with default logging settings. Note that role management has overhead, but when applied to frameworks is not a large security burden unless operations is constantly adding/removing frameworks from the system (which is highly unlikely). Also remember that the reason the system is currently designed to work both ways is that some frameworks do not yet support authentication, so testing with this feature turned on is a must before rolling it out in production.
Do I rate the risk from this issue to be huge? Honestly, it depends upon your implementation. If the Mesos cluster in question is putting sensitive information on a persistent disk, the risk is potentially high. If the cluster in question is not, there is still risk from the architecture design and lack of authentication, but the data available—and how it is protected outside of Mesos—determines how hazardous an attack would be. The ability to modify the configurations of running applications is a further concern; HTTP redirections alone could be troublesome.
In short, the existence of this transition period is going to make me more aware of the security implications of everything else in Mesos, simply because it shows a failure to follow common practices and might indicate that other places in the project suffer similar security design decisions, though the move toward fixing authentication will make me less hypercritical.
Note that I wanted to get this info out there, thus have not yet taken the time to explore exploiting this issue on my cluster. Implementation almost always provides different information/issues than researching offers, as it is a different stage of the process. I will update when I have time to make a pair of frameworks, one “normal” and one “attacker,” to test with. In security, it is always best to test out issues first, but these are well-enough documented, I opted for notifying the community before embarking on a potentially lengthy proof.
It is not the point of this post to imply that Mesos should be avoided. The point is to advise you that the “Command Line Switches” link (above) should be followed, read and the guidelines for turning on authentication should, at a minimum, be implemented. I would recommend limiting roles also, but at a minimum, turn on authentication for frameworks and test if it works in your environment.
I’d like to thank Ben Tomhave, Security Architect at New Context, for helping me out on some of these points. If there is any error in my analysis, it is mine alone, but the blog is far better for his assistance.