BizFirst Observe
PromQL Basics
PromQL (Prometheus Query Language) is a functional language for querying time series data. It supports instant queries (return a single value per series), range queries (return a matrix of values over time), and aggregation across label dimensions.
Selectors
# Select all series for a metric
bizfirst_workflow_executions_total
# Filter by label
bizfirst_workflow_executions_total{tenant_id="t123"}
bizfirst_workflow_executions_total{status="failed"}
bizfirst_workflow_executions_total{tenant_id="t123", status="failed"}
# Regex match on label
bizfirst_node_execution_duration_seconds_count{node_type=~"DataFetchNode|ApiCallNode"}
# Negative match
bizfirst_workflow_executions_total{status!="success"}
# Range vector — values over a time window (for rate/increase functions)
bizfirst_workflow_executions_total[5m]
Key Functions
| Function | Input | Output | Use Case |
|---|---|---|---|
rate() | Counter range vector | Per-second rate | Error rate, throughput per second |
increase() | Counter range vector | Total increase | Count of events in window |
irate() | Counter range vector | Instantaneous rate | High-resolution rate (volatile) |
histogram_quantile() | Bucket histogram | Quantile value | P50, P95, P99 latency |
avg_over_time() | Gauge range vector | Average over window | Smoothed gauge values |
delta() | Gauge range vector | Change in value | Rate of change for gauges |
Aggregation Operators
# Sum across all tenants
sum(bizfirst_hil_pending_count)
# Sum and keep tenant_id label
sum(bizfirst_hil_pending_count) by (tenant_id)
# Average node latency across all node types
avg(rate(bizfirst_node_execution_duration_seconds_sum[5m])
/ rate(bizfirst_node_execution_duration_seconds_count[5m]))
# Maximum HIL backlog across tenants
max(bizfirst_hil_pending_count) by (tenant_id)
# Count of tenants with HIL backlog > 10
count(bizfirst_hil_pending_count > 10)
histogram_quantile — Latency Percentiles
# P99 node execution latency by node type
histogram_quantile(0.99,
sum(rate(bizfirst_node_execution_duration_seconds_bucket[5m])) by (node_type, le)
)
# P50 and P95 for all node types combined
histogram_quantile(0.50, sum(rate(bizfirst_node_execution_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(bizfirst_node_execution_duration_seconds_bucket[5m])) by (le))
# P99 for a specific tenant
histogram_quantile(0.99,
sum(rate(bizfirst_node_execution_duration_seconds_bucket{tenant_id="t123"}[5m])) by (le)
)
Arithmetic and Comparisons
# Error rate ratio (failed / total)
rate(bizfirst_workflow_executions_total{status="failed"}[5m])
/
rate(bizfirst_workflow_executions_total[5m])
# Percentage of HIL tasks that are overdue
(bizfirst_hil_overdue_count / bizfirst_hil_pending_count) * 100
# Alert condition: error rate > 5%
(
rate(bizfirst_workflow_executions_total{status="failed"}[5m])
/
rate(bizfirst_workflow_executions_total[5m])
) > 0.05
PromQL in Grafana
Grafana's Prometheus query editor provides PromQL autocomplete, metric explorer, label value suggestions, and real-time query validation. Press Shift+Enter to run the query. Use the "Explain" button (Grafana 9+) to get a step-by-step explanation of what a complex PromQL query is doing — extremely helpful when learning.