In my previous post I discussed how to take your DevOps to the next level by taking it beyond infrastructure automation, to the automation of your deployments and code pushes, through patches and updates.
And then I promised to make it interesting…so here is the next stage – actually using the extracted data to get your systems to work for you.
Monitoring Like IT’S YOUR BUSINESS
Let’s start by discussing what it means to monitor your application, and what kind of data you can extract from it.
At the most basic level, you want to know whether your application is available for your users. It sounds very basic and simple, but setting it up properly is actually not an easy task. It is intended to give you an answer to the most important question to your business: can my users use the application? Think of it as a big red or green light on your dashboard. Although the answer may seem obvious and simple, but it is not at all. There are all kinds of parameters to take into account when answering this question, for example, what if the system is apparently up and running but the response time is very slow, or what if only parts of the system are actually running. At the end of day, you want to have that kind of an alert “traffic light” that tells you whether your system is functioning – yes/no, up/down.
System, Application, and Business Metrics
The green-red traffic light is an aggregation of multiple levels of monitoring. The most basic one is system related. It contains indicators like process availability, CPU and memory utilization, etc.
The next level of monitoring is application related and is composed of application level KPIs (key performance indicators). KPIs can be anything from the average response time for a user for a certain request, to the number of concurrent database connections. In other words, anything that’s specific to the application and its architecture.
And then above these, comes the highest level of metrics, which are business metrics. Ultimately these are what’s really important. If your business metrics are ok, then you can assume everything is running fine. Sample business metrics include the rate of failure to register to your website, how many users did a certain operation in the website or how many users executed a specific transaction. If we want to be really specific, then take Google hangout for example, a business metric of theirs would be how many people in a certain timeframe joined or started a hangout.
Logs to the Rescue
Logs are often used here as means to collect your metrics and alert about erroneous conditions. Logs usually contain a lot of useful information, but you need to be able to extract it and make sense of it. This means gathering all of the logs emitted from all of your servers, parsing them and looking for specific patterns that will help you generate the KPIs over time. For example, you look for a log message that says “User X signed in”, and by counting these messages you can produce a business level metric for how many users register over time. If you add some more data to the log message, such as the user’s location, or operating system, or anything else for that matter, you can slice and dice data and get deeper insights into how your application is behaving. When logs are emitted in a structured and consistent way (e.g. in a JSON format), it becomes very easy to analyze them and produce the relevant KPIs. There are many toolsthat can help with that.
From Simple Automation to Orchestration
That’s where orchestration comes into the picture. At the most basic level, orchestration is a higher form of automation, which helps you setup all the pieces that are related to your application, starting from the infrastructure (VMs, networks, block storage volumes, security groups, etc.), to the platforms your app runs on (database, web server, etc.), and all the way up to the application modules and code. This entire setup is often referred to as a topology. The role of an orchestration framework is to materialize a certain topology. More advanced orchestrators go beyond materializing the topology, and change it to meet the current workloads and needs of the application.
As you probably gathered by now, monitoring and log gathering are an essential part of any running app, so they should be an integral part of the orchestrated application topology. Since the orchestration process is topology aware, it can wire and configure monitoring for your application components very easily, which is one of its greatest benefits. Going through these process without a global view of the topology can often be time consuming and error prone. Moreover, as the topology changes , you need to reconfigure and rewire your monitoring tools, and a good orchestrator will that for you as well.
Next Up: Reactive and Proactive Orchestration, The Devops Holy Grail
Hopefully by now you see the value of orchestration as a higher form of automation. In the next post I’ll dive deeper into the post-setup phase, and discuss how orchestration tools can react to events and monitoring data and adjust the application’s runtime topology to best fit the current workloads.