Two paths to metal devops: cloud-like API driven & cluster building

I’ve been seeing a rising interest in metal DevOps fueled by containers and scale-out data center platforms (like Hadoop, Ceph & OpenStack) that run at the metal level. While I see this is a growing general trend (Packet, Internap , RackSpace, OpenStack Ironic, MaaS), I’m going to stay firmly within my wheelhouse and use OpenCrowbar as my reference here.

Building on the API-driven metal features of OpenCrowbar, this has translated into two paths for workloads to run on metal:

1) “Cloudify” the metal using APIs from tools like Chef Provision, SaltStack Libcloud, Docker Machine, Cloud Foundry BOSH. These tools have clients that target cloud APIs like OpenStack and Amazon. These same clients work against cloud are easily ported to Crowbar’s APIs. Five years ago, conventional wisdom was that we’d need a universal cloud API; however, practice has shown it’s not very difficult to wrap APIs in a way that does not reduce every cloud to a least common denominator.

2) DevOps deploy the workload using hand-offs to tools like Chef, Saltstack, Puppet or Ansible. This approach leverages the community scripts (Cookbooks, Modules, Playbooks) for the workload with the critical ability to create a tuned environment and inject the needed parameters directly into the scripts. A critical lesson we learned going from Crowbar v1 to v2 was for our scripts to have crisp attribute input/output boundary to avoid embedding environmental knowledge into the code.

While I’m casting this in Crowbar terms, I see this approach to metal as coming into the market by force fuels by a desire for containers-on-metal and devops-on-metal.

Let’s look at some of the unique and shared use-cases for each approach:

Metal API	Both	Metal Cluster
Easy Cloud to Metal Migration Minimal Tool Customization	Portability of DevOps Scripts Take advantage of power cycling Enables constant refresh cycles	Leverage Hardware features Advanced Network topologies

In either case, you have to handle bespoke (hipster word for custom) steps in the provisioning flow that are unique to the your operational needs. Our experience is that each site (even each server!) is unique in some incremental way. For example, one site may require teamed networks with VLANs while another requires flat networks with an SDN layer.

These differences are not mistakes or errors: the reality of physical ops and individual operational choices mean that there are a lot of valid configurations. Rather than attempt the Sisyphean task of enforced conformity, we work to abstract differences so that they can be ignored when they are not material.

In the end, the choices are not mutually exclusive. Metal APIs are often faster but harder to optimize. You can use them to get started quickly and then invest time to optimize a cluster for long term operations. The underlying physical orchestration can support both.

Are you looking at getting closer to metal? Which of the options above makes the most sense to you? I’d love to hear about your use-cases, architecture and configuration requirements.