APM for All: Simplifying the Complexity of Cloud

As companies adopt hybrid and even multi-cloud strategies, their IT departments are beginning to use dynamic technologies to build new application features on top of existing back-end business systems. Most organizations now have a blended, multi-cloud architecture where public clouds are integrated with the existing on-premises, private or hybrid cloud infrastructure and applications.

This means, for all the agility, flexibility and scalability the cloud affords us daily, the cloud is also a source of increased complexity for technology professionals required to assure the availability and performance of the application environment. This, in turn, is expected to have a substantial impact on overall management. Gartner predicts that by 2020, “75% of enterprises will experience visible business disruptions due to infrastructure and operations (I&O) skills gaps.”

It’s vital, then, to first understand the underlying technologies supporting application availability and performance throughout your entire IT environment. Next, you must understand how your technologies are dynamically interacting with each other in both development and production. And you need to know this not just at the various levels of the application stack itself—across all infrastructure as well as services, function calls and code—but for various types of applications themselves: monolithic, service-oriented architectures (SOA) and microservices-based.

That’s why, whether on-prem, hybrid or cloud-native—and whether you’re running AWS, Azure or Google Cloud (IaaS or PaaS)—you need comprehensive visibility into your complete infrastructure and application stack. Application performance management (APM) gives you this visibility into your IT environment by simplifying, and more importantly, normalizing, app stack management.

The Case For APM

APM has significant business value. In today’s environments, an IT team typically no longer has only one application per virtual machine—but rather dozens of virtual machines per physical host (whether on-premises, private or public cloud), with hundreds or thousands of containerized microservices per virtual machine. Additionally, these business environments with hundreds (or many thousands) of containers are continually being reconfigured and brought in and out of production underneath customer-facing applications. If something goes wrong, the digital operations can collapse. And they all need to play nice together since the underlying resources are shared.

At the same time, the game has changed. The red, yellow and green glow of infrastructure dashboards now has less bearing on whether you’re really delivering high-quality user experiences. Instead, success metrics are now based on evolving real-world, end-user experiences. You are customer-facing, because clients interact with you through your applications and websites. If your site is slow, you lose money and it negatively impacts the business brand.

To fix the problem, you need specifics. For example, is the transactional performance slow because you don’t have enough compute? Or is it slow because the application code is bad? These factors are interrelated. In our dynamic, shared-and-virtualized everything environments, it’s challenging to understand where the true causes of your problems are.

When Amazon went down for an hour during Prime Day last year, experts estimated the company lost 100 million dollars in sales. Imagine the impact of a potential outage during the holiday season. Of course, those are Amazon-sized numbers. But scale down those Godzilla figures to your size, and it still hurts—either through lost revenue or damaged brand reputation.

With APM, you can see all the metrics and find out how the application’s performing over time. You can see it degrade, giving you time to figure out why it’s slowing—before it stopped getting you your raw materials on time. You can understand down to the line of code, or database query, exactly what, where and why it’s slow.

But what are we talking about when we talk about APM? Gartner has pointed out the historical confusion on terms. There’s also a matter of differing capabilities among APM tools. Some are geared toward monolithic applications; others for Java and the web (and other applications) and still more are tuned for microservices. Beyond that, there are full-stack APM solutions that can measure it all.

Normalizing IT Management With APM

APM strategies and toolsets aren’t all the same. By my count, there are three iterations of APM implementations. First, there’s APM dating from when all tech was monolithic, of which the best example is the old CA Wily Introscope. Then came a second generation, which covered areas such as SOA, Java and the web—a good example here would be AppDynamics and the old Dynatrace. After that comes a version of APM tuned specifically for microservices and containerized application environments, such as the new Dynatrace and Instana.

In some cases, the actual APM technologies themselves haven’t kept up to date with the underlying infrastructure changes. That’s especially true for microservices. There are some legacy APM solutions that don’t support microservices in containerized application environments. On the flip side, some new companies in the APM space only operate in the microservices space. They are container-based, and don’t support old monolithic code. They also don’t support n–tier/SOA application environments.

Most customers, however, have mixed environments. Almost no one is operating with 100% microservices in containers operationally managed by Kubernetes.

By my reckoning, modern APM systems should monitor it all, because everything is mixed and likely to stay that way, if history is any indication. Companies still rely upon massive databases of proprietary information—databases of business records. They’re just building all-new applications in front of them.

If you posit that it’s now more important than ever to ensure availability and performance as experienced by the user, then you need APM across all those three categories of applications and their environments, including public cloud providers such as AWS and Azure. You need full-stack APM.

Getting Started

Before you do anything else, first remember your due diligence. Ask around and seek out first-hand knowledge about the APM solution vendor. Don’t just trust them to give you objective advice on how good they are; there’s a great deal of hype in the APM market that can often cloud our understanding of a certain solution’s actual function. So, where do you start? Typically, it’s best to start your thinking as if you were the end user coming into your application and infrastructure stack. This means monitoring, measuring and understanding the real performance of your web-based or custom applications. You should start with the customer (external-facing) side and then work your way back in through the stack from a monitoring perspective. Next, monitor, measure and understand the performance metrics of the application itself. Here you’ll find critical capabilities, such as distributed transaction tracing and code profiling, to be critical to truly understand how all the application components relate from a performance perspective. Finally, while it’s essential to also know how your infrastructure components are performing, it’s critical to do so in full application performance context. This application-context infrastructure monitoring must encompass both performance instrumentation as well as all logs associated with your application and related infrastructure.

Start small for quick success and to build a foundation from which to make APM more broadly successful for your organization. Start with one, or at a most a handful, of applications needing the greatest attention. Prove value and success and then ramp up adoption from there.

In all cases, remember—unless you’re solely relying on serverless computing, with only microservice-based applications in containerized environments, you’ll need visibility across the three application architecture types: monolithic, SOA and microservices. Otherwise, you’re not getting all the information you need.

It’s always been the goal of IT throughout all technology revolutions—whether it was mainframe, time-sharing, client-server or distributed, and whether it’s hybrid or full-on public cloud—to get a single pane of glass for operational management. True, full-stack APM delivers that view.

— David Wagner