Grafana Tempo
Grafana Tempo is the distributed tracing component of the default stack. It ingests traces from the OTel Collector via OTLP, stores them in object storage, and provides trace search and correlation with logs and metrics — all with minimal resource requirements.
What Distributed Tracing Provides
A distributed trace is a complete record of a single request as it travels through multiple services. For BizFirstGO, one workflow execution produces one trace — with child spans for each node executed, each external service call, and each HIL suspension/resume cycle.
Traces answer questions that logs and metrics cannot easily answer:
- Which specific service call is causing a workflow to be slow?
- In a multi-service execution, which node spent the most time waiting on external I/O?
- What was the exact sequence of events in a failed workflow execution?
- How much time was spent in each executor node vs. framework overhead?
BizFirstGO Trace Structure
Every workflow execution produces a structured distributed trace with a predictable span hierarchy:
workflow.execute [root span — entire execution duration]
├── node.execute [StartNode]
│ duration: 5ms
│ node_type: StartNode
├── node.execute [DataFetchNode]
│ duration: 320ms
│ node_type: DataFetchNode
│ └── http.client [GET https://api.example.com/data]
│ duration: 315ms ← the slow call
├── node.execute [ApprovalNode]
│ duration: 4h 23m ← HIL suspension
│ ├── hil.suspend [event]
│ └── hil.resume [event]
└── node.execute [EndNode]
duration: 2ms
Span Attributes on BizFirstGO Spans
| Attribute | Present On | Value Example |
|---|---|---|
workflow.id | Root span | wf-8a4c2f91 |
execution.id | All BizFirstGO spans | exec-d1e2f3a4 |
tenant.id | All BizFirstGO spans | tenant-abc-123 |
node.key | node.execute spans | approval-node-01 |
node.type | node.execute spans | ApprovalNode |
hil.actor | hil.suspend spans | user-xyz |
hil.outcome | hil.resume spans | approved |
TraceQL Quick Reference
# Find all spans for a specific execution
{ span.execution_id = "exec-d1e2f3a4" }
# Find slow node executions (over 5 seconds)
{ name = "node.execute" && duration > 5s }
# Find failed workflow spans
{ rootName = "workflow.execute" && status = error }
# Find HIL suspensions over 24 hours
{ name = "node.execute" && span.hil.outcome = "timeout" && duration > 24h }
For the complete Tempo reference — deployment, storage configuration, advanced TraceQL, and exemplar setup — see Guide5: Tempo.