From Pets to X-as-Code: Enabling Control and Predictability for ITOps at Scale

Amy Wheelus, vice president of network cloud at AT&T, recently spoke about declarative predictability at the Open Networking Summit in Antwerp. AT&T is the world’s largest telecommunications company providing mission critical communication services where people’s jobs–and lives–can be on the line. As they move away from labor-intensive IT operations toward cloud-native automation, there is a need to trust the declared IT infrastructure state can be predictably controlled and reproduced. This requirement to build predictability, reproducibility and trust into IT systems is not new but has grown in both scale and complexity as IT departments went from a few servers blinking in the basement to the core around which every business rotates.

With every customer interaction and business function becoming more digital, companies must be able to rely upon their IT departments if they hope to continue to keep pace in today’s rapidly changing world. This article traces the evolution of IT operations from Pets towards X-as-code with each improvement bringing companies closer to declarative predictability. These best practices enable enterprises to reliably deliver core business value through their IT department and keep pace in the cloud native era.

Yesterday: Pets Versus Cattle and Infrastructure as Code

Back when the IT department was just servers in the back room, it was all hands on deck when anything went wrong because work ground to a halt. The CEO couldn’t get their emails and the sales department couldn’t book new orders. From these early days, it was learned that in order to trust IT with critical business services, high availability and reliability needed to be built into the system. This lead to the creation of the Pets vs. Cattle analogy where each infrastructure component is interchangeable and can quickly be replaced when anything goes wrong. This allows companies to quickly recover from or even avoid outages.

While treating servers as cattle solves some problems, as any rancher can tell you, handling a whole heard brings its own set of challenges. As the size of managed infrastructure grows, it can be difficult to keep track of where, when and how it is actually deployed. In addition, changes to production systems to keep them healthy can lead to configuration drift creating unique snowflake servers which are pets rather than cattle. It becomes a difficult task to predict how a system will react or rely upon it in critical situations when its actual state is unknown.

To help eliminate this problem and control how infrastructure is configured and deployed, ITOps teams moved to infrastructure-as-code. In this way of working, applications and servers are never modified once deployed. If something needs to be changed, a new version is created and the old one is decommissioned. Each deployment can be versioned to create an overview of what is in production and how it is deployed.

Managing cattle rather than pets and using infrastructure-as-code has helped companies move from ticket-based operations with long lived machines to more scalable and dynamic infrastructure. These practices build reusability, scalability, automation and versioning into IT systems to create a simpler, more predictable deployment and management process. By being able to declare the state of its infrastructure, companies are able to predict and trust the outcome.

Today: Cloud-Native Infrastructure

Looking at the definition of cloud native from the Cloud Native Computing Foundation, it quickly becomes clear why many companies are turning toward these best practices. They help companies build and run resilient and manageable infrastructure that–combined with robust automation–creates dynamic, yet predictable systems. Declarative APIs and immutable infrastructure will be used as two examples to understand how cloud native builds upon and expands the techniques covered above.

Declarative APIs make infrastructure as code and automation possible. For example, with Kubernetes, operations teams are able to declare the desired state of the system and automate the reconciliation with the actual state until they match. It originally began with just container orchestration but has since expanded to cover many other types of cloud resources including the physical infrastructure itself through Custom Resources and Cluster API respectively. Declarative APIs make infrastructure management simpler and more reproducible by automating operations and eliminating errors in production. Declarative APIs can be leveraged to create immutable infrastructure.

Immutable infrastructure goes beyond just servers as cattle and only treating the infrastructure as code. Immutable infrastructure covers every higher layer of the stack until each one is codified and managed like cattle. Only exact copies are deployed and they can be completely replaced at any time. If anything needs to be changed, a new deployment is made and the old ones are decommissioned. Immutable infrastructure works for the whole stack to eliminate variations leading to fewer deployment failures, consistency across environments, easy horizontal scaling and a simple rollback and recovery process.

As these two examples show, cloud native enables companies to reliably scale out dynamic IT services without having to simultaneously scale out their operations team. Building declarative predictability into systems engenders trust in automatic processes, even when critical services are running on them.

Tomorrow: X-as-Code

While cloud-native best practices provide many benefits, applications and infrastructure are just the beginning of the story for enterprise IT teams. In today’s world, they are presented with the Herculean tasks to manage a dynamic and sprawling environment while still keeping in compliance with internal policy along with national and international laws. This challenge moves far beyond mere servers and treating them like cattle to compliance and governance across multiple clouds and hundreds or thousands of developers. With these challenges in mind, the question still remains on how to best deliver on these demands at the enterprise scale.

Looking across architecture, configuration, blueprinting, policy and governance, if each of these is simply viewed as another layer of the stack above the application, their operations can be radically simplified. By applying the principles of immutable infrastructure and cattle to this further layer of the stack, each of these can be defined and managed using an X-as-code model. These types of management models can already be seen taking root across the cloud-native ecosystem.

Kubernetes creates blueprints-as-code through the use of Admission Controllers while policy-as-code can be defined in Pod Security Policies and Network Policies. The CNCF even has a policy engine, OPA, as a sandbox project that gives operators fine-grained yet flexible policy control across their entire stack. By using code to decouple policy and other layers from the actual implementation, the resulting stack is easier to understand, flexible enough to handle future requirements and less expensive to maintain.

At Loodse, our customers were struggling to manage configuration, services and policy across their clusters. Leveraging the Kubernetes controller paradigm in Kubermatic, we are able to automate the deployment and lifecycle of each of these add-ons. This gives users the ability to declare, enforce and automate the required state of not only the infrastructure but the configuration and policy, too. Kubernetes controllers bring predictability and control to dynamic and scalable cloud-native IT even as it begins to handle more and more mission critical services.

Cloud Native in Production: Declarative Predictability

With IT becoming the core of every business, bringing cloud-native best practices into production enables operators to predictably declare, run and manage their infrastructure at the enterprise scale. Being able to trust in these automated operations frees up resources and opens up many new possibilities. Developers can be given guardrails and become empowered to do their job rather than waiting in a ticket queue. Operators can easily create highly available self-healing infrastructure that meets governance standards allowing them to focus on delivering a platform with differentiated business value.

At each step of the development of modern IT practices, from cattle to X-as-code, operational processes have become more streamlined allowing both developers and operators to drive more value to the bottom line. The future of IT infrastructure and operations will be a trusted platform that automates undifferentiated heavy lifting, allowing the whole business to focus on customer value.

To learn more about containerized infrastructure and cloud native technologies, consider coming to KubeCon + CloudNativeCon NA, November 18-21 in San Diego.

— Bill Mulligan