Prometheus Overview
Prometheus is the metrics collection and time-series storage engine for BizFirst Observe. It uses a pull model — scraping exposed /metrics endpoints at regular intervals — and stores the data in its embedded TSDB. PromQL enables powerful aggregations, rate calculations, and latency percentile queries.
The Pull Model
Prometheus is fundamentally different from push-based metrics systems (like StatsD or InfluxDB with line protocol). Instead of services sending metrics to Prometheus, Prometheus reaches out to each service's /metrics HTTP endpoint and scrapes the current values.
Pull Model Benefits
- Prometheus controls scrape rate — no overload from misbehaving services
- Immediate detection of down services (scrape fails →
up{job="..."} = 0) - Simple for services — just expose /metrics, no connection to maintain
Pull Model Trade-offs
- Services behind firewall need push gateway or reverse proxy for scraping
- Short-lived jobs (batch workers) need Pushgateway to expose metrics
- Network must allow Prometheus to reach all target services
Metric Types
| Type | Always Increases? | BizFirstGO Example | Primary Operation |
|---|---|---|---|
| Counter | Yes (resets on restart) | bizfirst_workflow_executions_total | rate()[5m] — compute per-second rate |
| Gauge | No (can go up or down) | bizfirst_hil_pending_count | Direct value — current backlog |
| Histogram | Yes (bucket counts) | bizfirst_node_execution_duration_seconds | histogram_quantile(0.99, ...) — P99 latency |
| Summary | Yes | Rarely used | Pre-computed quantiles (less flexible) |
Prometheus /metrics Format
BizFirstGO services expose metrics in Prometheus text format at GET /metrics:
# HELP bizfirst_workflow_executions_total Total workflow executions
# TYPE bizfirst_workflow_executions_total counter
bizfirst_workflow_executions_total{tenant_id="t123",status="success"} 4821
bizfirst_workflow_executions_total{tenant_id="t123",status="failed"} 47
bizfirst_workflow_executions_total{tenant_id="t456",status="success"} 1205
# HELP bizfirst_node_execution_duration_seconds Node execution duration
# TYPE bizfirst_node_execution_duration_seconds histogram
bizfirst_node_execution_duration_seconds_bucket{node_type="DataFetchNode",le="0.1"} 812
bizfirst_node_execution_duration_seconds_bucket{node_type="DataFetchNode",le="0.5"} 1843
bizfirst_node_execution_duration_seconds_bucket{node_type="DataFetchNode",le="1.0"} 2104
bizfirst_node_execution_duration_seconds_bucket{node_type="DataFetchNode",le="+Inf"} 2211
bizfirst_node_execution_duration_seconds_sum{node_type="DataFetchNode"} 847.23
bizfirst_node_execution_duration_seconds_count{node_type="DataFetchNode"} 2211
# HELP bizfirst_hil_pending_count Current HIL tasks awaiting action
# TYPE bizfirst_hil_pending_count gauge
bizfirst_hil_pending_count{tenant_id="t123"} 12
How BizFirstGO Services Register Metrics
All BizFirstGO metrics are registered via the OTel Metrics API in MetricsRegistry.cs. The OTel SDK translates these to Prometheus format and exposes them via the /metrics endpoint:
// MetricsRegistry.cs — BizFirstGO metric definitions
public static class MetricsRegistry
{
private static readonly Meter Meter = new Meter("BizFirst.ProcessEngine", "1.0");
public static readonly Counter<long> WorkflowExecutions =
Meter.CreateCounter<long>(
"bizfirst.workflow.executions",
unit: "{execution}",
description: "Total workflow executions");
public static readonly Histogram<double> NodeExecutionDuration =
Meter.CreateHistogram<double>(
"bizfirst.node.execution.duration",
unit: "s",
description: "Node execution duration in seconds");
public static readonly ObservableGauge<int> HilPendingCount =
Meter.CreateObservableGauge<int>(
"bizfirst.hil.pending.count",
() => HilService.GetPendingCount(),
unit: "{task}",
description: "HIL tasks awaiting action");
}