Elevated Resource Levels

A microservice in your Kubernetes cluster has been exhibiting intermittent performance degradation after a recent code change. What could be potential root causes of this issue? (Select two correct answers)

Incorrect Prometheus alerting rules
Elevated resource limits for the microservice pods
Inadequate PodDisruptionBudget configurations
CPU and memory pressure on worker nodes
A misconfigured HorizontalPodAutoscaler (HPA)

Such an issue is challenging to debug with observability tools alone and requires a deeper understanding of the Kubernetes infrastructure. 123 people answered this question with 24% getting it right.

Here’s a detailed explanation.

Correct Answers:
2. Elevated resource limits for the microservice pods:
When dealing with intermittent performance degradation of a microservice in a Kubernetes cluster, elevated resource limits for the microservice pods can be a significant root cause. Resource limits for pods are defined in their YAML configurations using the resources field. This issue can lead to resource contention, causing performance bottlenecks.

4. CPU and memory pressure on worker nodes:
Worker nodes in a Kubernetes cluster can experience CPU and memory pressure when resource demands from pods exceed the available capacity. This can lead to performance degradation for all workloads running on the affected nodes. SREs and DevOps need to monitor node resource utilization and take corrective actions.

If high resource usage is detected, it may be necessary to scale the cluster by adding more nodes, tune resource requests and limits for pods, or optimize applications to use resources more efficiently.

Incorrect Answers:
1. Incorrect Prometheus alerting rules:
Prometheus is a powerful monitoring and alerting system used in Kubernetes, but issues with alerting rules are unlikely to directly cause intermittent performance degradation in a microservice. While it is crucial to have correct alerting rules set up to detect issues, issues with alerts are typically associated with visibility and observability, not the root cause of performance degradation.

3. Inadequate PodDisruptionBudget configurations:
PodDisruptionBudgets (PDBs) are used to control the disruption of pods during maintenance or updates. While they are essential for ensuring high availability, they are not directly related to performance degradation. Inadequate PDB configurations may lead to pod disruptions during updates, but this would result in unavailability rather than intermittent performance issues.

5. A misconfigured HorizontalPodAutoscaler (HPA):
A misconfigured HPA could potentially lead to over-scaling or under-scaling of pods, but it’s unlikely to be the root cause of intermittent performance degradation. HPAs primarily deal with pod scaling based on resource utilization, and their misconfiguration would affect scalability rather than performance degradation.

In summary, when dealing with intermittent performance degradation in a microservice within a Kubernetes cluster, SREs and DevOps should focus on resource allocation (resource limits) for pods and monitoring node resource usage. Elevated resource limits for pods and CPU/memory pressure on worker nodes can directly impact performance. While observability and alerting (Prometheus) are essential for detecting issues, they are not typically the root cause of performance problems. Similarly, PDBs and HPAs are more related to availability and scaling but do not directly address performance degradation issues.