BizFirst Observe
Verify Ingestion
After configuring BizFirstGO services, trigger one workflow execution and verify that all three signals (logs, metrics, traces) are appearing in the correct backends. This is the most important validation step before going to production.
Step 1: Trigger a Test Execution
# Trigger a minimal workflow execution via the ProcessEngine API:
curl -X POST http://processengine:8080/api/workflow/execute \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: test-tenant" \
-d '{
"workflowId": "wf-test-observability",
"input": { "testMode": true }
}'
# Note the executionId from the response:
# {"executionId": "exec-a1b2c3d4", "status": "started"}
# Save it for the queries below:
export EXEC_ID="exec-a1b2c3d4"
Step 2: Verify Logs in Loki
# Query Loki via API to verify log ingestion:
curl -G "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={job="processengine"} |= "'"$EXEC_ID"'"' \
--data-urlencode 'start='"$(date -d '5 minutes ago' +%s)"'000000000' \
--data-urlencode 'end='"$(date +%s)"'000000000' \
| jq '.data.result | length'
# Expected output: a number > 0 (the number of log streams with results)
# If output is 0: no logs found — see troubleshooting below
# In Grafana Explore:
# 1. Select data source: Loki
# 2. Enter query: {job="processengine"} |= "exec-a1b2c3d4"
# 3. Time range: Last 5 minutes
# Expected: Log lines showing execution start, node executions, completion
Step 3: Verify Metrics in Prometheus
# Query Prometheus via API to verify metrics scraping:
curl -G "http://localhost:9090/api/v1/query" \
--data-urlencode 'query=bizfirst_workflow_executions_total' \
| jq '.data.result | length'
# Expected output: a number > 0 (metrics series exist)
# Check that the processengine scrape target is healthy:
curl -s http://localhost:9090/api/v1/targets | \
jq '.data.activeTargets[] | select(.labels.job == "processengine") | .health'
# Expected output: "up"
# If "down": the processengine /metrics endpoint is not reachable from Prometheus
# In Grafana Explore:
# 1. Select data source: Prometheus
# 2. Enter query: rate(bizfirst_workflow_executions_total[5m])
# Expected: A non-zero value after the test execution
Step 4: Verify Traces in Tempo
# Search for traces from the test execution in Tempo via API:
curl -G "http://localhost:3200/api/search" \
--data-urlencode 'tags=service.name=processengine' \
--data-urlencode 'start='"$(date -d '10 minutes ago' +%s)" \
--data-urlencode 'end='"$(date +%s)" \
| jq '.traces | length'
# Expected output: a number > 0
# Get a specific trace by ID (if you have the trace ID from logs):
# Look for "traceId" field in Loki log output
TRACE_ID="your-trace-id-from-logs"
curl "http://localhost:3200/api/traces/$TRACE_ID" | jq '.batches | length'
# In Grafana Explore:
# 1. Select data source: Tempo
# 2. Query mode: Search
# 3. Service Name: processengine
# Expected: Recent traces listed with duration and span counts
Common Ingestion Problems and Fixes
| Symptom | Likely Cause | Fix |
|---|---|---|
| No logs in Loki | OTEL_EXPORTER_OTLP_ENDPOINT unreachable | Check network connectivity from BizFirstGO container to otel-collector:4317. Check firewall rules. |
| Loki returns data but wrong job label | Inconsistent OTEL_SERVICE_NAME | Verify env var is set; restart service after changing env vars. |
| Prometheus target shows "down" | /metrics endpoint blocked or wrong port | Check ProcessEngine exposes /metrics on the configured port. Verify Prometheus scrape_config job. |
| No traces in Tempo | Sampling too aggressive (0% rate) | Check OTEL_TRACES_SAMPLER_ARG — set to 1.0 for testing, then reduce. |
| OTel Collector logs show "dropped" spans | Tempo write path overloaded | Check Tempo container resources. Increase memory limit. |
| TraceId missing from log lines | OTel logging SDK not bridged to Serilog | Verify ObservabilityServiceExtensions.cs registers the OTel log bridge. |
Checking the OTel Collector Pipeline
# The OTel Collector exposes its own metrics on port 8888:
curl -s http://localhost:8888/metrics | grep otelcol_receiver
# Key metrics to check:
# otelcol_receiver_accepted_spans_total{receiver="otlp"} — spans received
# otelcol_receiver_accepted_metric_points_total — metrics received
# otelcol_receiver_accepted_log_records_total — log records received
# otelcol_exporter_sent_spans_total{exporter="otlp/tempo"} — spans forwarded to Tempo
# otelcol_exporter_send_failed_spans_total — failed exports (check for errors)
# Check OTel Collector logs for pipeline errors:
docker compose logs otel-collector --tail=50 | grep -E "error|warn|drop"
Allow 30-60 Seconds After Restart
After changing environment variables and restarting BizFirstGO services, allow 30-60 seconds before running verification queries. The OTel SDK buffers spans and batches them — the first batch may take up to 10 seconds to flush. Prometheus scraping runs on a 15-second interval, so the first metric data point appears within 15-30 seconds of service startup.