Portal Community

The Pull Model

Unlike Loki and Tempo which receive pushed telemetry from the OTel Collector, Prometheus actively pulls metrics from its targets. Each BizFirstGO service exposes a /metrics endpoint in Prometheus text format (or OpenMetrics format), and Prometheus scrapes it on a configurable interval.

This pull model has important properties:

BizFirstGO Metric Types

TypeDescriptionBizFirstGO ExampleUse Case
CounterAlways increases; resets on restartbizfirst_workflow_executions_totalRate calculations: rate()[5m]
GaugeCan go up or downbizfirst_hil_pending_countCurrent state: backlogs, connection counts
HistogramDistribution with configurable bucketsbizfirst_node_execution_duration_secondsLatency percentiles: histogram_quantile(0.99, ...)
SummaryPre-calculated percentiles (client-side)Rarely used in BizFirstGOWhen percentiles must be exact

Key BizFirstGO Metrics

# Workflow executions (counter, labeled by tenant and outcome)
bizfirst_workflow_executions_total{tenant_id="t123", status="success"}
bizfirst_workflow_executions_total{tenant_id="t123", status="failed"}
bizfirst_workflow_executions_total{tenant_id="t123", status="timeout"}

# Node execution latency histogram
bizfirst_node_execution_duration_seconds_bucket{node_type="ApprovalNode", tenant_id="t123", le="0.1"}
bizfirst_node_execution_duration_seconds_bucket{node_type="ApprovalNode", tenant_id="t123", le="1.0"}
bizfirst_node_execution_duration_seconds_bucket{node_type="ApprovalNode", tenant_id="t123", le="5.0"}
bizfirst_node_execution_duration_seconds_count{node_type="ApprovalNode", tenant_id="t123"}
bizfirst_node_execution_duration_seconds_sum{node_type="ApprovalNode", tenant_id="t123"}

# HIL metrics
bizfirst_hil_pending_count{tenant_id="t123"}          # Current backlog
bizfirst_hil_suspension_duration_seconds_bucket{...}  # Wait time distribution

# EdgeStream throughput
bizfirst_edgestream_messages_total{topic="workflow.events", tenant_id="t123"}

# Active connections
bizfirst_active_connections{service="flow-studio-signalr"}

PromQL Quick Reference

# Error rate over 5 minutes (ratio of errors to total)
rate(bizfirst_workflow_executions_total{status="failed"}[5m])
  /
rate(bizfirst_workflow_executions_total[5m])

# P99 node execution latency by node type
histogram_quantile(0.99,
  sum(rate(bizfirst_node_execution_duration_seconds_bucket[5m])) by (node_type, le)
)

# HIL backlog across all tenants
sum(bizfirst_hil_pending_count)

# EdgeStream throughput (messages per second)
rate(bizfirst_edgestream_messages_total[1m])
Deep Dive Available

For the complete Prometheus reference — scrape configuration, recording rules, Alertmanager setup, and the full BizFirstGO metrics catalog — see Guide4: Prometheus.