Ten Years Later — What Is Cloud Native?

Ever since the term cloud native made its debut in 2010, its usage has grown in popularity. Today cloud native is used as a qualifier for innovative, leading edge applications, platforms and systems or cloud-savvy organizations and processes.

However, what is cloud native exactly? There are some definitions, there are many opinions, but first and foremost there is a lot of confusion: Is my system already cloud native if I follow a microservice-based architecture? Is my system already cloud native if I run containers? Do I have to change development and operations?

In this post, we cut through any uncertainty and ambiguity and provide an accurate and concise definition of cloud native systems.

However, first, we have to discuss a set of prerequisite concepts.

Application + Platform = System

Frequently, in our day to day conversations, we use the terms application, platform and system interchangeably. Yet in this blog post, we draw a strict distinction:

Application: An application is the set of components that are designed and implemented for a specific use case.

Platform: A platform is the set of components that are designed and implemented independently of a specific use case.

System: A system is the combination of an application and a platform. Generally, we refer to an application as being hosted on a platform.

End users do not interact with applications or with platforms only; end users interact with systems. Consequently, we will focus our discussion on system-level properties, not application—or platform-level properties.

Application + Cloud Platform = Cloud System

A cloud platform is a service provider that enables a service consumer to request and release resources on demand via an Application Programming Interface (API).

According to this definition, a cloud platform is any service provider that enables a service consumer to request and release resources on demand:

Public Cloud Platform: A public cloud platform is a service provider that offers resources outside of its own organization.

Private Cloud Platform: A private cloud platform is a service provider that offers resources within its own organization.

Consequently, a cloud system is any application that is hosted on a cloud platform. However, what is a cloud native system?

Responsiveness, Scalability and Reliability

The distinction of cloud native systems versus non-cloud native systems is drawn along the lines of responsiveness, scalability and reliability.

In a nutshell, responsiveness is a system’s ability to meet the users’ expectations. In order to quantify responsiveness, we rely on a set of four related concepts:

Service Level Indicator: A service level indicator is a quantitative observation about the behavior of a system.
Service Level Objective: A service level objective is a predicate (a function that yields true or false) on a service level indicator that determines whether the behavior of a system meets an objective.
Error Rate: The error rate is the ratio of the number of observations that do not meet their objectives to the number of observations in total for a fixed time interval.
Error Budget: The error budget is an upper limit on the error rate.

Based on the above, we can now define responsiveness of a system as the ability of the system to keep its error rate below its error budget.

These concepts are best explained with a tangible example: Consider a web application processing HTTP transactions, that is, accepting HTTP requests and returning HTTP responses.

Service Level Indicators

Here, we define two SLIs.

SLI Latency: determines the latency of a transaction, that is, the time when the request enters the system to the time the response leaves the system.

SLI Result: determines the result of a transaction, that is, the status code of the response.

SLIs are recorded at the edge of a system, here the Ingress component, to exclude any variation that is outside our control and focus on the performance of the system that is within our control. For example, we do not want to include any variation in observed latency or observed results originating in the users’ network connections. However, we do want to focus on the performance of the system once Ingress accepts the request until Ingress returns the response.

Next, we define two SLOs.

SLO Latency: For this example, we define the SLO Latency to be met if a transaction’s SLI Latency is less than 250 milliseconds.

SLO Result: For this example, we define the SLO Result to be met if a transaction’s SLI Result is not in the HTTP 500 range.

Finally, we set the value of the error rate’s time interval to a 30-day sliding window and the value of the error budget’s to 1%.

By definition, this system is responsive if the ratio of the number transactions over the last 30 days that do not meet their SLOs to the number of transactions over the last 30 days in total does not exceed 1%.

Scalability and Reliability

At this point we have the necessary tools to define scalability and reliability:

Scalability is responsiveness in the presence of load.
Reliability is responsiveness in the presence of failure.

So Finally, What Is Cloud Native?

With the definition of cloud systems and the definition of responsiveness in place, we can now define cloud native systems: A cloud native system is a cloud system that is scalable and reliable by construction.

The operator “by construction” is in stark contrast to “by requirement”: The design and implementation of a cloud native application guarantees scalability and reliability within the limits set by the service level objectives and error budgets.

According to this definition a cloud native application must be able to:

Autonomously detect and mitigate the presence of load.
Autonomously detect and mitigate the presence of failure.

Discussion

This post defined the term “cloud native” as a property of a system, not as a property of organizations or processes.

The defining criteria of a cloud native system are scalability and reliability by construction. Here, the employment of microservices (an architectural choice) or containers (a technological choice) is not a defining criteria of cloud native systems. If you want to know more, head over to—or better yet, get involved with—the Cloud Native Computing Foundation, a Linux Foundation project founded in 2015 with the mission to build a sustainable ecosystem for cloud native systems.

— Dominik Tornow