Prometheus is a leading metrics-based open source monitoring solution and time series database used across the IT community to collect metrics from systems, applications and services. According to the 2021 Cloud Native Survey from the Cloud Native Computing Foundation (CNCF), the use of Prometheus in production has reached 65%, an increase of 43% year-over-year.
“As cloud-native stacks have grown and matured, observability has become crucial for visibility into application metrics, performance, alerting, and insights,” said Chris Aniszczyk, CTO of CNCF. “Prometheus is a key element of this, enabling organizations to build a foundation of observability.”
If you are using Prometheus, it is probably because of its many advantages including reliability, easy initial implementation and its open source nature. However, in addition to these benefits, Prometheus also brings several challenges if you are deploying and maintaining the solution on your own.
Facing the Four Main Prometheus Challenges
Prometheus users face four primary challenges that can be alleviated by the right service provider:
Challenge One: Support
Prometheus users face the dilemma posed by any open source tool: There is no vendor behind the solution to deliver technical support. It is easy to take tech support for granted, but when you are working with a technology that is not supported, even a small issue can become a substantial problem. A commercially-backed product can provide a critical level of expertise you do not have in-house, and take on responsibility for keeping the system up and running.
Challenge Two: Maintenance
The greatest challenge you’ll likely face with Prometheus is the amount of time your team must spend configuring and maintaining the system.
Just one example of how Prometheus drains your resources is the use of third-party exporters to expose metrics in the Prometheus format. These integrations are designed by members of the open source community an do not come with any support; upgrades are hit-or-miss.
Exporters are difficult to maintain. First, there are often multiple exporter options and choosing the right one for each of your services is time-consuming. Second, you can easily have hundreds of different services running in your environment and your team can spend hours every day making sure the exporters are working properly.
Another maintenance issue involves PromQL, the Prometheus query language. PromQL is a custom query language, and it is very complex and difficult to learn. Having to rely on your one PromQL expert for all Prometheus queries across the organization is simply not practical or sustainable.
With all these maintenance requirements, you will have some team members spending all their time managing and maintaining a Prometheus environment, eating away at resources that could be used to focus on business needs and innovate new applications.
Challenge Three: Scalability
Users tend to deploy Prometheus in silos, with a separate server for each cluster. Over time, you may have tens or even hundreds of Prometheus servers in your environment that you are not aware of. This leads to scalability issues regarding consistency, maintainability and a global view of your metrics.
In addition, storage is a major hindrance to scaling Prometheus, because the system typically stores metrics for about two weeks. If you need to analyze metrics older than a couple of weeks, you run into a serious problem.
Challenge Four: Correlating Data
Data does not have much value if you cannot correlate it and, unfortunately, Prometheus does not provide correlation for metrics to other elements in your environment like processes or cloud regions. This makes it difficult to pinpoint the source or impact of a problem.
Options for Solving Prometheus Challenges
Options are available to help you with Prometheus but most do not solve the four major issues outlined above. The following are your options:
APM: APM vendors all have some level of support for Prometheus, but this is an add-on to their platform. They offer an ingest tool that can transform Prometheus data into a format you can query natively within the APM platform. In reality, they are not helping you utilize Prometheus but instead are forcing you to work within the constraints of their system.
Cloud tools: Some cloud hyperscalers offer solutions to support Prometheus. These tools can help with deployment and infrastructure maintenance, but you are still stuck doing all the other configuration and maintenance work.
Managed service: The preferred option is a Prometheus managed service, which can alleviate all four of the main challenges posed by open source Prometheus. The right optimized managed service provider delivers support, maintenance, long-term storage and data correlation.
Must-Have Capabilities for Your Prometheus Managed Service Provider
The following are essential capabilities you should look for when choosing a Prometheus managed service provider:
Prometheus compatibility: You need an optimized managed service provider whose platform is designed to be natively compatible with Prometheus, consuming data in the Prometheus format and allowing you to run PromQL queries on the data.
Simplified PromQL: To eliminate the reliance on PromQL experts for every query, a managed service provider should simplify the use of PromQL, without breaking compatibility.
Exporter management: The right optimized managed service provider will be able to streamline the selection, deployment and management of exporters, saving you a significant amount of time and effort.
Automated discovery: Tracking and optimizing all your services can be challenging. For this reason, a Prometheus-managed service should be able to automatically discover the services running in your environment and help guide you with configuration.
Metric Storage: One of the shortcomings of Prometheus is the lack of long-term storage. Consequently, a Prometheus managed service provider should provide a database for long-term storage of metrics, so you can perform comparative analyses of the data, from month to month and even year-to-year.
Prometheus compatibility: If you already use open source Prometheus, your teams are used to using things like PromQL and Grafana dashboards. You should select a managed service provider that is Prometheus-compatible so your team can use the same dashboards and queries they have always used to view the data.
Conclusion: Optimized Valuable Tool
Prometheus is not a standalone solution. But it is an optimized valuable tool to collect metrics for performance analysis and it is appreciated by your architects, developers and IT Ops pros. While Prometheus may have some limitations, it is worth finding a solution to these issues; that solution is a managed Prometheus service. The right managed service provider can give you the best of both worlds—all the benefits of Prometheus without the time, resources and expense needed to run it effectively.