Containers, Service Mesh and API Gateways: It Starts at the Edge

Anyone embracing container technology such as Docker or Kubernetes has no doubt heard about the associated next big thing: service mesh, which promises to homogenize internal network communication between microservices and provide cross-cutting nonfunctional concerns such as observability and fault-tolerance. However, the underlying proxy technology that powers a service mesh also can provide a lot of value at the edge of your systems—particularly within an API gateway.

Although it may appear that service mesh technologies have suddenly sprung up overnight, the reality is that many organizations have been using what we now identify as a service mesh for quite some time (including Verizon, eBay and Facebook). One such organization is Lyft, a U.S.-based ride-sharing service with a $1 billion annual revenue. Lyft also happens to be the creators of the open source Envoy proxy that is powering a lot of development in the service mesh space, such as the Kubernetes-native Istio control plane and Ambassador API gateway.

The State of SOA Networking

In a talk last year, Matt Klein, one of the creators of the Envoy Proxy, described the state of service-oriented architecture (SOA) and microservice networking in 2013 as “a really big and confusing mess.” Debugging was difficult or impossible, with each application exposing different statistics and logging and providing no ability to trace how requests were handled throughout the entire services call stack that took part in generating a response. There was also limited visibility into infrastructure components such as hosted load balancers, caches and network topologies.

“It’s a lot of pain,” he said. “I think most companies and most organizations know that SOA [microservices] is kind of the future and that there’s a lot of agility that comes from actually doing it, but on a rubber meets the road kind of day in and day out basis, people are feeling a lot of hurt. That hurt is mostly around debugging.”

Maintaining reliability and high-availability of distributed web-based applications was a core challenge for large-scale organizations. Solutions to the challenges frequently included either multiple or partial implementations of retry logic, timeouts, rate limiting and circuit-breaking. Many custom and open source solutions used a language-specific (and potentially even framework-specific) solution that meant engineers inadvertently locked themselves into a technology stack “essentially forever.” Klein and his team at Lyft thought there must be a better way. Ultimately, the Envoy proxy project was created to be this better way.

Working Outside-In: Edge Proxy Benefits

Although the open source release of the Envoy Proxy project made Klein and the Lyft engineering team look like an overnight success in September 2016, the reality was that the journey was filled with challenges over the four years from the initial hybrid-SOA Lyft architecture to their current microservice- and service mesh-enabled system. In a more recent talk at the 2017 Microservices Practitioner Virtual Summit, Klein talked about the essential need for—and associated challenges of—demonstrating business value for a technology-focused migration toward a service mesh network topology.

Klein’s first piece of hard-won advice was, “Start with [an] edge proxy.” Microservice-based web applications need edge reverse proxying to avoid both the exposure of the internal business service interfaces (which would violate the principle of loose coupling) and the high operational overhead of exposing each service via an independent URI or RPC endpoint. Existing cloud offerings are “not so good” in this edge proxy or gateway space, or are presented to engineers as a potentially confusing range of differing products. Instead, Klein recommended starting an implementation of modern proxying technology at the edge, as this provides business value in the form of improved observability, load balancing and dynamic routing. Once an engineering team has understood how to operate proxy technology at the edge, the benefits can be rolled inward toward ultimately creating an internal service mesh.

Evolution of the Edge: From Proxies to API Gateways

AppDirect, an end-to-end commerce platform for managing cloud-based product and services with an estimated $50 million annual revenue, has undertaken a similar journey to Lyft, as highlighted in a recent blog post, “Evolution of the AppDirect Kubernetes Network Infrastructure.” The dynamic and ephemeral nature of cloud technology and container orchestrators such as Kubernetes, which provide many benefits in terms of scalability and resilience, mean that they are additional challenges for exposing public endpoints for business functionality provided via a composite of microservices.

The AppDirect engineering team took a measured approach to solving these challenges, beginning with making core parts of the configuration static (such as service ports exposed) and placing a load balancer in front of each application. Their second iteration embraced more dynamism using HashiCorp’s Consul distributed key/value store and the HAProxy reverse proxy that supported “hot reloading” of configuration as it changed at runtime. Ultimately, however, the team were keen to leverage the richer functionality provided by a more fully featured API gateway.

“The goal of our API gateway was to leave the exposed public APIs untouched and accessible even to legacy URLs and partner-customized domains, yet allow us to grow by ‘injecting’ and replacing old components one by one,” according to the blog post.

After evaluating a series of open source and commercial offerings the AppDirect team deployed the Kubernetes-native Ambassador API gateway, which builds upon the Envoy proxy:

“Relying entirely on the Kubernetes native API—which we know and love—Ambassador is lightweight, stateless, and uses no external datastore. Ambassador exclusively makes use of Kubernetes annotations to drive the active route configuration (i.e., it is the control plane of Envoy’s data plane),” the team noted in the blog post.

Although AppDirect has not fully implemented a service mesh for internal communication, the company is already learning about the benefits of technology like the Envoy proxy and, critically, how to handle deployments of this in production.

Making Sense of it All

The adoption of service mesh technology within cloud native implementations and migrations is only just beginning, but it is already possible to identify that this technology fills a gap currently identified with modern container-based application platforms such as Kubernetes. All of the benefits of a service mesh—such as rate limiting, circuit-breaking and observability—also can be leveraged at the edge of systems. If you want to explore and learn about this technology, starting at the edge of your systems and working inward can be an effective strategy. This can also allow the technology to demonstrate value, such as improved observability and resilience, earlier than attempting to work inside out.

— Daniel Bryant