Architecture Overview
BizFirst Observe is the enterprise observability platform for BizFirstGO. It captures logs, metrics, and distributed traces from every service — giving platform engineers and operators complete visibility into workflow executions, node performance, and system health.
The Four-Layer Architecture
BizFirst Observe is organized into four distinct layers. Each layer has a clear responsibility and communicates only with the layers adjacent to it. This separation ensures that instrumentation code in your services never needs to know about storage backends, and storage backends never need to know about visualization requirements.
Layer 1 — Instrumentation
The OpenTelemetry SDK embedded in every BizFirstGO service. It automatically captures logs, metrics, and traces without manual coding in most cases. Services emit telemetry via the OTLP protocol to the Collector.
Layer 2 — Collection
The OpenTelemetry Collector acts as the central aggregation and routing point. It receives telemetry from all services, applies processors (sampling, enrichment, redaction), and exports to the appropriate storage backends.
Layer 3 — Storage
Three purpose-built storage backends: Loki for log streams, Prometheus TSDB for metric time series, and Tempo for distributed trace objects. Each is optimized for its signal type — they are not interchangeable.
Layer 4 — Visualization
Grafana connects to all three storage backends as data sources. It provides dashboards, Explore for ad-hoc queries, and alert rule management. Users interact with Grafana — not directly with the storage backends.
Core Components at a Glance
| Component | Role | Signal Type | Protocol In | Port |
|---|---|---|---|---|
| OTel Collector | Aggregation & routing hub | All three | OTLP/gRPC + HTTP | 4317, 4318 |
| Grafana Loki | Log aggregation & query | Logs | HTTP (from Collector) | 3100 |
| Prometheus | Metrics scraping & TSDB | Metrics | Pull (scrape /metrics) | 9090 |
| Grafana Tempo | Distributed trace storage | Traces | OTLP/gRPC | 4317 (internal) |
| Grafana | Unified visualization UI | All three | HTTP queries to backends | 3000 |
| Alertmanager | Alert routing & deduplication | Alerts | HTTP (from Prometheus) | 9093 |
| Node Exporter | Host metrics (CPU/mem/disk) | Metrics | Scraped by Prometheus | 9100 |
| cAdvisor | Container metrics | Metrics | Scraped by Prometheus | 8080 |
Signal Routing Map
Each of the three telemetry signal types follows a distinct routing path through the architecture. The OTel Collector is the fan-out point:
Logs
BizFirstGO services → OTel Collector (OTLP receiver) → Loki exporter → Grafana Loki (port 3100) → queryable via LogQL in Grafana
Metrics
BizFirstGO services expose /metrics endpoint → Prometheus scrapes every 15s → stored in TSDB → queryable via PromQL in Grafana
Traces
BizFirstGO services → OTel Collector (OTLP receiver) → Tempo exporter → Grafana Tempo → queryable via TraceQL and trace ID lookup in Grafana
The Correlation Model
The three signal types are not isolated silos — they are designed to be correlated. The primary correlation key is the TraceId, a 128-bit identifier that is:
- Included in every structured log line emitted by a BizFirstGO service during a request
- The root identifier for a distributed trace in Tempo
- Embedded as an exemplar in Prometheus histogram metrics (linking a specific high-latency observation to its trace)
Grafana's Derived Fields feature makes the TraceId in a Loki log line a clickable link that opens the corresponding trace in Tempo — enabling seamless cross-signal navigation during incident investigation.
Logs, metrics, and traces have fundamentally different query patterns, cardinality characteristics, and retention needs. Loki is optimized for label-filtered stream queries; Prometheus is optimized for time-series range queries with mathematical functions; Tempo is optimized for trace-ID lookup and span attribute search. A single store cannot efficiently serve all three workloads.
Infrastructure Prerequisites
Before deploying BizFirst Observe, ensure the following infrastructure is available:
- Docker or Kubernetes — single-node Docker Compose for development/small deployments; Kubernetes for production
- Object storage — S3-compatible bucket (AWS S3, MinIO, Azure Blob) for Loki and Tempo long-term storage
- Network access — BizFirstGO services must reach the OTel Collector on port 4317 (gRPC) or 4318 (HTTP)
- Disk — minimum 50GB for a development setup; production sizing in Sizing Guidelines