Portal Community

Complete Data Flow — Write Path

When a workflow executes, telemetry is produced and flows through the system as follows:

1

Service emits telemetry

The BizFirstGO ProcessEngine service begins executing a workflow. The OTel SDK immediately starts a root workflow.execute Activity span. As the workflow progresses, the SDK emits log records (via Serilog), metric samples (counter increments, histogram observations), and span events — all tagged with the same traceId.

2

SDK batches and exports

The OTel SDK's built-in BatchExporter accumulates telemetry and flushes it every 5 seconds (or when the batch reaches 512 items). It sends the batch to the OTel Collector via OTLP/gRPC on port 4317. For metrics, Prometheus scrapes the /metrics endpoint every 15 seconds instead.

3

Collector receives and processes

The OTel Collector's OTLP receiver accepts the batch. Processors run in order: memory limiter check → resource attribute enrichment → PII redaction (for logs/traces) → batching. Processing is synchronous within the pipeline.

4

Fan-out to storage backends

The Collector exports to three backends simultaneously: logs go to Loki via the Loki exporter, metrics go to Prometheus via remote write, and traces go to Tempo via OTLP/gRPC. These exports happen concurrently — a slow Loki write does not delay trace export.

5

Storage backends index and store

Loki indexes the label set and appends the log line to the appropriate stream chunk. Prometheus stores the metric sample in its WAL (Write-Ahead Log), which is compacted into TSDB blocks. Tempo writes the trace spans to its WAL, then flushes to object storage within minutes.

Complete Data Flow — Read Path

When an engineer opens Grafana to investigate an incident:

1

Engineer opens Grafana dashboard or Explore

Grafana renders the dashboard or Explore view. For each panel, it constructs a query (LogQL, PromQL, or TraceQL) with the selected time range and variable values.

2

Grafana queries the data source

Grafana sends the query to the appropriate backend: LogQL to Loki's /loki/api/v1/query_range, PromQL to Prometheus's /api/v1/query_range, or TraceQL to Tempo's /api/search.

3

Backend executes the query

The storage backend scans its index to find matching series/streams, fetches the relevant data blocks, applies any filters, and returns the result set to Grafana.

4

Grafana renders the result

Grafana transforms the raw query result into the appropriate panel visualization — time-series graph, log list, trace timeline — and displays it to the engineer.

5

Cross-signal correlation

If a log line contains a traceId, Grafana renders a link button. Clicking it opens Tempo's trace detail for that TraceId in a split pane. Similarly, clicking a Prometheus exemplar point on a histogram opens the linked trace in Tempo.

Write Latency Expectations

Signal TypeFrom Emission to QueryableControlling Factor
Logs (Loki)5–30 secondsOTel SDK batch interval (5s) + Loki ingester flush (up to 25s)
Metrics (Prometheus)15–30 secondsPrometheus scrape interval (15s) + scrape duration
Traces (Tempo)5–60 secondsOTel SDK batch interval (5s) + Tempo WAL flush (up to 55s)
Not Real-Time

BizFirst Observe is a near-real-time observability system — not a real-time alerting system. Do not use it for sub-second latency monitoring. The minimum practical latency from a log line being emitted to it appearing in Grafana Explore is approximately 5–15 seconds under normal load.

Failure Modes and Resilience

Component FailureImpactRecovery
OTel Collector downAll telemetry from services is queued in SDK buffer (default 2048 items), then dropped on overflowSDK automatically reconnects when Collector returns; no data replay for dropped items
Loki downCollector retries log export with backoff; other signals (metrics, traces) continue normallyLoki restarts, Collector resumes; WAL on Loki prevents data loss for recent logs
Prometheus downMetrics scrapes fail; existing TSDB data intact; Collector buffers remote write attemptsPrometheus restarts, resumes scraping; some metric samples lost during downtime
Tempo downTrace export fails; Collector retries with backoff; logs and metrics continueTempo restarts; traces in flight may be lost; WAL protects recent data
Grafana downNo user access to dashboards or alerts; all write paths continue unaffectedGrafana restarts; stateless — reconnects to backends immediately