Portal Community

The Three Correlation Links

1. Log-to-Trace (Loki → Tempo)

Every BizFirstGO log line includes a traceId field. Grafana's Loki data source Derived Fields feature recognizes this field and renders it as a clickable link. Clicking the TraceId opens the corresponding trace in Tempo's trace detail view — without leaving Grafana.

# Log line in Loki (JSON format):
{
  "timestamp": "2026-05-25T14:32:01Z",
  "level": "error",
  "message": "Node execution failed: timeout waiting for external API",
  "execution_id": "exec-d1e2f3a4",
  "node_type": "DataFetchNode",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",  ← clickable link
  "spanId": "00f067aa0ba902b7"
}

# Grafana Derived Fields config (in Loki data source):
matcherRegex: '"traceId":"(\w+)"'
url: '${__value.raw}'
datasourceName: Tempo

2. Metric-to-Trace (Prometheus → Tempo via Exemplars)

Prometheus histograms can carry exemplars — additional metadata attached to individual samples that includes a traceId. When Grafana renders a histogram panel, each data point can show a small dot indicating an exemplar is attached. Clicking the dot opens the linked trace in Tempo.

# OTel SDK automatically attaches exemplars to histogram observations
# when a trace is active. In Prometheus exposition format:
bizfirst_node_execution_duration_seconds_bucket{...,le="5.0"} 142 # {traceID="abc123"} 4.7 1716648000

3. Trace-to-Log (Tempo → Loki)

From Tempo's trace detail view, you can jump to the logs that were emitted during the same time window by any service involved in the trace. This is configured in the Tempo data source settings:

# In Grafana, Tempo data source config:
jsonData:
  tracesToLogs:
    datasourceName: Loki
    tags: ['service.name', 'tenant_id']
    # Queries Loki for logs from the same service during trace duration ±5 minutes
    spanStartTimeShift: -5m
    spanEndTimeShift: 5m

The Incident Investigation Workflow

Here is how cross-signal correlation is used in practice when an engineer is investigating a slow workflow:

1

Alert fires or dashboard shows P99 spike

The Node Performance dashboard shows a spike in ApprovalNode P99 latency. An exemplar dot is visible on the spike data point.

2

Click exemplar → open trace

Clicking the exemplar dot opens Tempo's trace detail. The trace waterfall shows all spans for the slow workflow execution — revealing that DataFetchNode spent 4.2 seconds on an external API call.

3

Click "Logs" button on trace → open Loki

From the trace view, clicking the Logs link opens Grafana Explore with a pre-built LogQL query scoped to the same service and time window. The logs reveal the specific API endpoint that timed out.

4

Resolution

The engineer identifies the external API URL, escalates to the integration team. Total investigation time: under 2 minutes without switching tools.

TraceId is the Universal Key

The traceId appears in logs, as an exemplar in metrics, and as the primary identifier in Tempo traces. It is the single value that ties all three signal types together for a single request. BizFirstGO's ObservabilityServiceExtensions ensures that every log line emitted during a traced request automatically includes the active traceId — no manual logging code required.