The Changing Role of APM in a Microservices World

Microservices usage has skyrocketed. Fully 91 percent of organizations are using or have plans to use microservices and 92 percent expect to grow their use of microservices in the coming year. It is no longer if but when: 86 percent of organizations expect microservices to be the default within five years.

With the growing use of microservices and containers, applications have become orders of magnitude more dynamic. It’s not unusual for containers or building blocks of the application to have a shelf-life of less than a week. However, what is great for application development can create challenges when it comes to monitoring and diagnosing performance issues in ever-changing applications. How does application performance monitoring (APM) change to support microservices and dynamic cloud-native apps?

More Moving Parts

Small, specialized microservices may facilitate shorter development cycles, continuous releases and greater agility, but when deployed across dynamic cloud infrastructure, the sheer number of moving parts also makes it much more difficult to troubleshoot application performance. Highly distributed and transient containerized components need to be tracked and measured continuously, demanding a big data approach for APM tools. To be effective, monitoring containers and microservices requires high-definition visibility across all transactions, machine learning to pinpoint problems and a shift in focus to users from servers.

High-Definition Data

Because the dynamic, containerized environment is constantly changing, measurement data must be collected at high frequencies—think seconds rather than minutes. Collecting infrastructure performance metrics at one-minute intervals, or longer, can distort the real situation and sometimes miss entire events altogether. For example, at low definition, a pattern of recurring but brief spikes in CPU usage will either be averaged out or not register at all—but if detected would be a tell-tale sign of a systemic configuration issue. High-definition infrastructure metrics, collected at one-second intervals, are crucial for a comprehensive view of resource utilization and shared resource dependency issues.

However, infrastructure metrics alone are not enough. Building a complete view of the operating environment calls for data from multiple sources, including application traces, system metrics, application logs and network behavior. The resulting data set is very large, necessitating big data collection methods and processes. In-line data compression, NoSQL back-ends, non-intrusive client agents and cloud scaling are foundational technologies for modern APM tools.

Correlating Data End to End

Big data analytics and visualizations are necessary to process APM data into human-scale patterns and events. Identifying root cause of a performance problem often involves following a transaction from end to end, through multiple containers, shared resources and downstream infrastructure dependencies. Next-generation APM tools dynamically map the flow of every single transaction, through all intermediate relationships. Machine learning and artificial intelligence, while still in their early stages, are proving essential to finding patterns and anomalies in the data, suggesting probable causes and flagging them for human investigation. Combining the best of machine and human skills results in faster problem identification, diagnosis and resolution. IT operations teams benefit with improved application performance, reduced repair times and higher uptime even in dynamic application environments.

Putting the Focus on the End User

Legacy APM tools tend to focus on infrastructure components and resources, monitoring the health and utilization of servers, storage and networks. Organizations today, driven by digital competition and customer satisfaction, need to concentrate on the user’s perspective. This means monitoring multiple types of user devices for health indicators and mapping each transaction to the backend system. Automatic instrumentation delivers real-time end user metrics that trigger alerts much earlier than server-side monitoring, and enables more effective prioritization. End user insights also help DevOps and product owners track feature and application adoption rates, compare performance before and after upgrades and directly relate technical issues and investments to business priorities.

Monitoring Apps for Today and Tomorrow

Clouds, containers and microservices require a different approach to performance monitoring that takes into account the speed, size and life cycle of application modules. Effectively watching these distributed and transitory processes demands a big data approach that can handle the scale of end-to-end transaction data in modern enterprise environments.

Legacy APM tools that only take snapshots result in an incomplete view of application reality and lengthen problem resolution times. Instead, modern APM systems collect a complete data set, enabling DevOps to quickly resolve intermittent and complex issues that impact business productivity and user satisfaction. They augment their skills with machine learning to quickly identify anomalies and guide them to probable causes. The results are better application performance, enhanced competitiveness and perhaps most important, increased user satisfaction.

— Gayle Levin