How a robust monitoring toolchain can help improve performance
Digital service providers (DSPs) are facing huge strain on their video-on-demand (VoD) ecosystem due to many factors, including the multi-fold increase in the amount of video content that gets published, the increase in the number of devices (mobile, tablet, laptops, smart TVs, etc.) with HD viewing options and constant need for improving customer experience. More and more DSPs are adopting microservices architecture for the video-on-demand (VoD) services to improve scalability, resiliency and continuous delivery/deployment of such large and complex applications.
The Complexity of Microservices-based Applications
DSPs are opting for microservices to reap its benefits. However, monitoring microservices is very challenging, as a single application broken down into collection of loosely coupled services (microservices) runs on multiple hosts in a very dynamic environment and needs to interact with multiple systems that are disparate and dynamic in nature.
Implementing the right toolchain is critical in effective performance monitoring of microservices-based applications. This article focuses on building a robust toolchain to monitor microservices-based VoD applications.
Building a Toolchain for Efficient Monitoring
Identifying the Key Areas to Monitor
The first step in building the best toolchain is to identify the key areas to monitor. To ensure that the application functionality is delivered and optimized, the following aspects need to be monitored.
- Time taken from request till response
- Time-lapse at each touchpoint
- Success/failure rate of requests at each touchpoint
- Success/failure rate of requests by category/genre/content
- Busy/free time by requests/category/genre
- Trending requests by user/content/category/genre
Because the VoD application is running on microservices that are spread across multiple hosts, it cannot be monitored through a single tool. Therefore, it is necessary to build a toolchain to monitor application performance effectively.Â
Key Activities Involved in VoD Application Monitoring
VoD applications have distributed touchpoints and require a set of activities to capture KPIs such as success/failure rate, time-lapse etc.
- Scrape and ship the log files from different containers on which VoD applications are running.
- Sort and process the log files carrying request/response types, success/failure, and time of request/response for various touchpoints based on the flow (VoD architecture, application ID) in order to get meaningful values for monitoring KPIs.
- Store the sorted log files in an external database for further analysis such as trend analysis, types of request failing, success/failure ratio by time etc.
- Visualize and monitor the KPIs using dashboard.
Tools Required to Support Monitoring Activities Efficiently
- Log Scrapper: To scrape microservices log files containing request/response type, timestamp etc. from different hosts.
- Log Shipper: To collate the application logs collected by the scrappers from individual containers and ship to an external storage system.
- Log Aggregator: To consolidate and sort microservices logs that come from multiple containers under one application to analyze performance metrics. Aggregator helps to identify touchpoints where requests/responses are getting stuck or failing.
- Message Queuer: To help in alleviating back pressure of host-level disk space while a large amount of data is being written to the database or while network issues and DB unavailability scenarios.
- Database/Storage: An external centralized storage system for better accessibility of processed log files for root cause analysis, trend analysis etc.
- Performance Monitoring Dashboard: Rule-based dashboard view to show performance KPIs report from processed log files and send notifications in case of exceptions.
Recommended Toolchain for Efficient Monitoring of VoD Applications
- Log Scrapper: Beats is a lightweight agent used for the frequent scrapping of logs. However, for VoD application monitoring, Logstash is recommended, as it provides various plug-ins to transform the log data that helps to monitor VoD applications effectively.
- Log Shipper: Fluentd is suitable for monitoring at the edge level. However, for VoD application monitoring, Flume is recommended, as it works better when log files need to be queued, aggregated, processed and stored in an external storage.
- Log aggregator: Storm is a distributed real-time computation system. However, for VoD application monitoring, Spark is recommended, as it is a fast and general-purpose engine for large-scale data processing applications.
- Queuer: Kafka queues a large number of VoD log files to avoid unnecessary pressure on database.
Benefits of a Monitoring Toolchain in the VoD Space
- DSPs can achieve improved quality of service (QoS) with the help of effective application log and service flow monitoring.
- With the recommended toolchain discussed in this article, DSPs can reduce monitoring overheads/costs up to 20 percent.
- Collected data from monitoring can help DSPs in their trend analysis (user preferences, top trending content, busy/free slots etc.).
- With microservices monitoring in place, it becomes easier for DSPs to manage and improve customer experience and resource utilization.
This article was co-authored with  Praveen Chakravarthy, Technical Architect, Prodapt.
— Vishwa Nigam