It's Time to Slay the Universal Installer Unicorn

While many people want a universal “easy button installer,” they also want it to work on their unique snowflake of infrastructures, tools, networks and operating systems. I call them snowflakes because their small differences expand combinatorially. That divergence turns the idea of a prescriptive reference architecture into a destructive myth that limits community and ecosystem growth.

Because there is so much needful variation and change, it is better to give up on open source projects trying to own an installer and instead focus on making their required components more resilient and portable.

Universal Installer Fallacy

So what does an easy button install look like? Let me tell you about one that I built.

It was 2011, the dawn of the OpenStack Era, and our team at Dell was already battle-weary from trying to make repeatable installs around Azure, Eucalyptus, Joyent and Hadoop. The simple reality was there was no success pattern we could follow deploying any one of those platforms—and certainly none cross-platform. Even though all used basically the same compute and network infrastructure, each had custom deployment code and configuration approaches. We jumped on becoming an OpenStack founding partner as a way to try to break the failure pattern.

What was failing? Each platform had deeply embedded environmental assumptions that were not repeatable or portable.

For example, one of those platforms required thirty (30!) full racks of gear and tethering back to the master deployer because its management and topology was hardwired into its infrastructure. Others were not much better: Installation often was “ssh apt-get install platform,” then “scp pre-built configuration files.” Even worse, those steps were always proceeded with 30 pages of prescriptive manual system prep steps that were updated daily.

Incorrectly, we believed that the solution was to create a better installer for OpenStack.

The Crowbar project, which is still used by SUSE, created a fully integrated install sequence from PXE boot and discovery to testing the final OpenStack installation. We could build a cloud on a complete rack of gear from boxes in minutes. The open source project even used open tools such as Chef and hardware-agnostic options to increase its community appeal. As the first installer (it predated even DevStack), the project was widely used in early OpenStack evaluations.

Despite that success, as a v1 technology, Crowbar’s major shortcoming was that it tightly coupled components and services together. That made the installation and management process very rigid and non-conforming to disparate operational models and environments.

Our install scars showed us the real battle lines that prevented portability. Even minor infrastructure variations resulted in broken or site-specific configurations. A major part of the challenge is that variation is not neatly contained; instead, it is spread throughout the environment. We needed to create automation that could both cope with heterogeneity across the community and be change-tolerant after installation to enable upgrades.

We came to accept that there is no single installer. Instead, scalable installers must assume that they will operate with other parts of the system in a coordinated way, because the diversity of options makes it impossible for them to control everything. In fact, we’ve seen that smaller, composable install steps are much easier to automate consistently. Even more importantly, smaller steps make it possible to manage systemwide upgrades in a controllable way.

Asking for an “easy button” installer is the wrong question. Instead, let’s ask, “Can we create sharable components that can be automated in multiple ways?”

There’s no universal installation script to create cloud rainbows; however, there are strategies we can use to make individual projects installable over a wide range of options. These approaches include composability, consistent configuration and use of standard tools and patterns for operation. These items help projects focus on their own concerns and allow operators to apply best practice techniques.

Basically, the first step is for project developers to limit their scope and delegate operational concerns to operators.

Even in a platform-as-a-service (PaaS) project that attempts to hide operational complexity from developers, such as Kubernetes, there remains someone who must operate that platform using foundational infrastructure and tooling. We make that job more complex when we get distracted trying to apply the new concepts to the old infrastructure. The addressing of an impedance mismatch pulls effort away from the central abstraction.

Since we do need consistent installs from the community, projects do have a responsibility to help those efforts.

How can they help? By ensuring that their component services are well-bounded with clear input configurations, operational requirements, upgrade patterns and output expectations.

But really, that’s a future post. Please let me know what you think and where we should go from here.