Welcome to Day 3 of the Observability Series! In this installment, we’ll focus on PromQL (Prometheus Query Language), the tool that makes Prometheus a powerful monitoring solution. If you're diving into Prometheus, PromQL is your gateway to querying, analyzing, and gaining insights into your system's metrics.
🛠 What is PromQL?
PromQL is a flexible and powerful query language designed to work with time-series data stored in Prometheus. It allows you to:
Retrieve data from specific metrics.
Perform mathematical operations for analysis.
Aggregate and manipulate data based on labels or dimensions.
Build complex queries to monitor system behavior effectively.
🏗 Structure of a PromQL Query
A PromQL query typically includes:
Metric Name: The specific measurement (e.g.,
http_requests_total
).Labels: Filters for narrowing down results (e.g.,
{method="POST", status="500"}
).Range Selectors: Time ranges for fetching historical data (e.g.,
[10m]
).Functions: Built-in operations to process data (e.g.,
rate()
,sum()
).
🔑 Basic PromQL Commands
Single Metric Query
http_requests_total
Fetches all time series data for the metric http_requests_total
.
Label Filtering
http_requests_total{method="GET", status="200"}
Retrieves time series data for successful GET
requests.
Time Range Query
http_requests_total{status="404"}[5m]
Fetches data for all 404
responses in the last 5 minutes.
⚙️ Aggregation in PromQL
Aggregation combines multiple time series into meaningful summaries.
Summing Time Series
sum(rate(container_cpu_usage_seconds_total[5m]))
Calculates the total CPU usage rate across containers over the past 5 minutes.
Grouping by Labels
avg(node_memory_Active_bytes) by (instance)
Returns the average active memory usage grouped by instance
.
Maximum and Minimum
max_over_time(node_memory_MemAvailable_bytes[1h])
min_over_time(node_memory_MemAvailable_bytes[1h])
Finds the maximum and minimum memory available over the last hour.
🔄 Advanced PromQL Functions
PromQL’s advanced functions enable deep analysis of metrics.
Rate
rate(http_requests_total[1m])
Computes the per-second increase in http_requests_total
over 1 minute.
Increase
increase(kube_pod_container_status_restarts_total[1h])
Calculates the total number of container restarts in the past hour.
Histogram Quantile
histogram_quantile(0.90, sum(rate(request_duration_seconds_bucket[5m])) by (le))
Finds the 90th percentile of request durations.
Predict Linear
predict_linear(node_network_receive_bytes_total[30m], 3600)
Forecasts the network bytes received in the next hour based on the last 30 minutes.
🧪 Additional Commands for Real-World Use Cases
Error Analysis
rate(http_requests_total{status=~"5.."}[10m])
Tracks the rate of server errors (5xx) over the last 10 minutes.
Top Resource Consumers
topk(3, rate(container_memory_usage_bytes[5m]))
Finds the top 3 containers consuming the most memory over 5 minutes.
Disk Usage Trends
delta(node_filesystem_free_bytes[1h])
Calculates the change in available disk space over an hour.
📈 PromQL in Action: Monitoring and Alerting
Kubernetes Pod Metrics
sum(rate(container_cpu_usage_seconds_total{namespace="prod"}[1m])) by (pod)
Aggregates CPU usage across pods in the prod
namespace.
Service Latency Analysis
avg_over_time(http_request_duration_seconds{job="web"}[10m])
Calculates the average response time for a web service over 10 minutes.
Alert for High Memory Usage
container_memory_usage_bytes > 1e+09
Triggers an alert when container memory usage exceeds 1 GB.
💡 Tips for Writing Effective PromQL Queries
Start Simple: Begin with basic queries to understand the metrics.
Layer Functions: Combine functions like
rate()
andsum()
for deeper insights.Test and Iterate: Use the Prometheus UI or Grafana to validate your queries.
Optimize Filters: Leverage labels to fine-tune queries and reduce unnecessary data retrieval.
🌟 Conclusion
PromQL is a game-changer for monitoring and observability, transforming raw metrics into actionable insights. By mastering its commands and functions, you can monitor complex systems effectively, analyze trends, and set up meaningful alerts.
As part of this Observability Series, we’ve explored PromQL fundamentals and advanced queries. Stay tuned for Day 4, where we’ll dive into setting up Grafana dashboards for Prometheus metrics!
What’s your favorite PromQL query? Share it in the comments below!