Reinventing Monitoring for Private, Hybrid Clouds

Monitoring of systems, networks and applications has been around since the inception of computing. However, in the last five years there have been dramatic changes and innovations in the computing, networking and applications development landscapes.

For instance, infrastructure is shared and allocated dynamically—often by continuous automation. Server virtualization by VMware and dynamic resource scheduling (DRS) allowed virtual servers to move around automatically between hosts based upon CPU and memory constraints. Automation vendors including VMTurbo and Cirba have taken this one step further by incorporating constraints such as networking and storage congestion, as well as allowing for the prioritization of workloads.

These days, virtualization extends to the network and storage. In some cases, virtual networks are established in software between every two things that have to talk to each other using products such as VMware NSX. Storage is being virtualized so that there is only a faint resemblance between the storage layer accessed by the operating system and the applications and the actual physical disks or flash arrays. With that said, there also are ever more layers of abstraction between the actual application and the physical hardware. These might include Docker, the Java virtual machine (JVM), a platform-as-a-service (PaaS) stack such as Cloud Foundry and the layers of virtualization listed above.

Now, the application layer is characterized by both diversity and rapid change. It is not just a Java world anymore; other languages are just as popular and new ones arrive every day. The pressure on business to put more functionality into production more quickly has led to both agile development and DevOps—both of which result in the applications being modified in production more frequently.

Private and hybrid clouds lead to layers of infrastructure, layers of management and teams and layers of tools, creating the same problem that silos of management created in the past. Each of the infrastructure-as-a-service (IaaS), PaaS and software-as-a-service (SaaS) layers in the cloud architecture have their own teams to manage them and their own tools to manage that layer. The tools to monitor these layers are:

For the IaaS layer – storage tools, storage area networking (SAN) tools, networking tools, server tools, virtualization (hypervisor tools) and tools to monitor the wide area network.
For the PaaS layer – tools to monitor all of the software services that support the applications, which include database servers, web servers, Java application servers, PaaS frameworks and, of course, now Docker.
For the SaaS layer – tools to monitor the applications themselves, which include application performance management products, as well as tools to monitor actual end user experience.

The Layers of the Private or Hybrid Cloud

The Problem: Monitoring

There are several core problems with these layers of tools that must be addressed in order for private clouds to be successful:

Lack of visibility – The data from each of these tools at each of these layers is not shared with the owners or operators of the other layers of the private cloud stack. A consolidated set of data or metrics is not available to business constituents or users of the private cloud, giving them no end-to-end visibility into the behavior of their applications and the workloads that support them.
Communication issues – When there are performance problems (as there always are), the layering of the toolset and lack of collaboration across the teams that own the three layers make problem resolution a time-consuming and frustrating process.
Improper data analysis – Despite the vast amount of monitoring being collected by all of these tools, in many cases the data necessary to ensure the private cloud is working correctly for its hosted applications and business constituents is not collected at all—or in the required manner.

What’s Needed: New Types of Monitoring Data

Highly dynamic and interactive systems have a new approach to collecting and processing monitoring data. The following concepts must be incorporated in a modern monitoring data pipeline to stay in the game:

Streams of metrics – A modern online application system produces streams of metrics starting with the response time (user experience) data and including data about how all layers of the software and hardware that support that interaction are behaving.
Streams of relationships – In a modern online application system, a transaction of interest is related to the JVM in which it runs, the operating system upon which the JVM runs, the virtualized hardware where the operating system runs and the entire virtualized and physical infrastructure (down to the spindle on the hard disk) that support that transaction.
Streams of state – Most enterprises have configuration management databases (CMDBs). These are supposed to store the current state of the entire software and hardware environment of the enterprise. But now online systems change too rapidly for CMDBs to be kept up to date—relegating them to the junkyard of worse-than-useless legacy technologies. Private clouds and their modern applications are simply too dynamic for the CMDB and the CMDB must be replaced with something that is kept up to date automatically and continuously.

The Crucial Metrics and Their Sources

The single most broken thing about monitoring is the mistaken notion that monitoring for resource utilization across the set of devices and stack of software that comprises an application or a private cloud can ensure acceptable service quality.

To fix this misconception, we have to redefine performance as not having available resources, but rather in terms of how long it is taking to get work done (response time and latency) and how much work is getting done per unit of time (throughput). Response time, throughput and, of course, errors need to become the crucial metrics by which service quality is judged and delivered.

If we accept that response time, throughput and errors are the crucial categories of metrics, the next steps are to identify what these metrics are at each layer of the private cloud stack and then identify the sources of those metrics. The most important thing to realize is the standard interfaces that exist at each layer of the stack (SMIS, SNMP, Netflow, JMX, etc.) do not measure response time, throughput and errors, and we have to rely on vendors that have invested heavily in the custom instrumentation required to collect these metrics.

We also have to recognize that the investment in collecting these metrics at a particular layer is significant, as is the level of technical expertise required to collect these metrics more than what can be assembled by a single vendor or tool.

Finally, we have to recognize that the industry is changing too quickly for any single vendor to be able to cover the waterfront or the entire cloud stack. Network virtualization, storage virtualization and containers are just the most recent examples of innovations that require advances and innovations in instrumentation, and these advances are going to continue to occur at an accelerating pace.

For the reasons listed above, the only monitoring strategy that can work for the private and hybrid cloud is to focus on best-of-breed vendors that collect the required data at a layer in the cloud stack, then feed these metrics into a common big data back end for processing or analysis.

A Monitoring Architecture for the Private, Hybrid Cloud

Both private and hybrid clouds require a completely different approach to collecting data, processing data and making this data useful to users and analysts. Enterprise IT operations organizations must adopt these modern approaches to managing their private clouds or else those clouds will not be cost-competitive and functionally competitive with public cloud offerings, leading to ever more outsourcing of workloads to public cloud providers.

About the Author/Bernd Harzog

Bernd Harzog is the CEO and Founder of OpsDataStore, the real-time big data back end for all IT operations management data and vendors. OpsDataStore’s open big data back end consumes and relates data from multiple sources and immediately makes that data useful to decision-makers using market-leading BI and visualization tools. Learn more at www.opsdatastore.com, and by following OpsDataStore on Twitter, @OpsDataStore, and LinkedIn.