The word monitoring possesses a meaning most people are familiar with. But what does monitoring actually break down into if you’re monitoring servers, applications and hardware?
One way of answering this question is in terms of black box and white box monitoring. The latter term refers to application monitoring, and the former to server or hardware monitoring.
In this article, I’ll explain white box vs. black box monitoring in more detail.
What is White Box Monitoring?
White box monitoring is the monitoring of applications running on a server. This could be anything from the number of HTTP requests your web server is getting to the response codes generated by your application. Other types of white box monitoring include:
- Monitoring MySQL queries running on a database server.
- Looking at the number of users utilizing a web application throughout the day, and alerting if this goes above a predefined threshold.
- Considering the above example of HTTP requests—splitting these out into monitoring the different kinds to ascertain how the application is performing, or whether users are getting served the correct content. For example, a 403 would demonstrate a user has tried to get to a part of the website they’re not allowed to visit. Likewise, a 200 would indicate their request was successful and they were served the content.
- Performing advanced detection of behavior we don’t expect to see, such as a user not going through the normal steps you’d expect when signing into your application or resetting a password.
What is Black Box Monitoring?
Black box monitoring refers to the monitoring of servers with a focus on areas such as disk space, CPU usage, memory usage, load averages, etc. These are what most in the industry would deem as the standard system metrics to monitor. Other types of black box monitoring include:
- Monitoring of network switches and other networking devices such as load balancers from the system metrics perspective, as defined above.
- Looking at hypervisor-level resource usage for all virtual machines running on the hypervisor (such as VMware, KVM, Xen, etc.).
- Alerting on hard disk errors that may present a problem if a disk isn’t replaced soon (using SMART, for instance).
Differences Between Black Box and White Box Monitoring
There are differences between these two types of monitoring. Traditionally, systems administrators would take care of both white and black box monitoring; however, with the advent of DevOps and modern changes in the IT industry, we’re increasingly finding that application developers are taking responsibility for the monitoring of the applications (white box) they’re writing and as a result, are building monitoring solutions or writing checks for monitoring systems deployed by DevOps engineers.
Systems administrators and DevOps engineers tend to take responsibility for the monitoring of black box items such as servers. There is some crossover where DevOps engineers can also take responsibility for white box monitoring, but this depends on the business or environment you’re working in.
White Box vs. Black Box Monitoring: Which is More Important?
It’s important to realize the significance of both types of monitoring. Historically, there was a gap in monitoring applications, and this presented lots of problems because black box monitoring would pick up issues with systems, such as high load or high network traffic, but there would be no information on the application side to show why this was happening—and more often than not these issues are caused by application issues, not server issues.
A good example of white box monitoring and its importance can be demonstrated with a black box monitoring check alert, and showing how the two work together. Take, for example, a black box monitoring alert that tells us that our server’s CPU usage is at 100 percent. We go on to investigate this issue and see that MySQL processes are the cause of this alert. If we have white box monitoring in place to also monitor the queries running in MySQL, the amount of connections into MySQL and the amount of time it is taking for queries to run, then we have a lot more information to help diagnose the issue. This could allow us to demonstrate that an application is running a query that is too resource-intensive or is badly designed, and provide feedback to the application teams with hard evidence.
Conclusion
There have been many changes to the way in which we implement monitoring systems over the last few years. We’ve moved away from implementing purely black box solutions to implementing white box solutions alongside them. And there are new solutions available that allow both types of monitoring to be performed from within one single application.
This is incredibly important, as it allows us to have more synergy between our development and systems teams, plan properly for capacity enhancements and deal with issues that may arise. As a DevOps or systems engineer in the modern IT industry, I think it’s vital to take these different monitoring methodologies into account when building out any support/monitoring solution for your team.