Storage Layer
BizFirst Observe uses three purpose-built storage backends — Loki for logs, Prometheus for metrics, and Tempo for traces. Each backend is engineered for the unique query patterns and cardinality characteristics of its signal type.
Storage Backend Comparison
| Attribute | Loki (Logs) | Prometheus (Metrics) | Tempo (Traces) |
|---|---|---|---|
| Data model | Log streams (label sets + log lines) | Time series (label sets + float64 samples) | Trace objects (spans with attributes) |
| Index strategy | Label index only — log content not indexed | Labels fully indexed | No full index — lookup by TraceId or attribute search |
| Query language | LogQL | PromQL | TraceQL |
| Storage format | Compressed chunks (Snappy) | TSDB blocks | Parquet + object storage |
| Default retention | 30 days | 15 days (local); unlimited with Thanos | 7 days |
| Default port | 3100 | 9090 | 3200 (HTTP), 4317 (OTLP) |
Grafana Loki — Log Storage
Loki stores logs as compressed streams. A stream is identified by its label set — all log lines sharing the same labels form one stream. Loki only indexes the label set, not the log content itself.
This design choice has two important consequences:
- Low storage cost — because log content is only compressed and stored, not indexed, Loki is significantly cheaper than Elasticsearch for the same log volume.
- Label cardinality matters — labels must be low-cardinality values (service names, environments, tenant tiers) not high-cardinality values (execution IDs, user IDs). High-cardinality labels create too many streams and degrade Loki performance.
Storage Backends for Loki
- Local filesystem — development only; not replicated
- Amazon S3 — recommended for production; supports lifecycle rules for Glacier transition
- Azure Blob Storage — supported; recommended for Azure deployments
- Google Cloud Storage — supported; recommended for GCP deployments
- MinIO — S3-compatible self-hosted; good for on-premises deployments
Prometheus — Metrics Storage
Prometheus uses its own embedded time-series database (TSDB). Each metric is stored as a series of (timestamp, float64_value) samples. The TSDB compacts data into immutable blocks over time, which are efficient for range queries.
Prometheus TSDB Block Structure
# Prometheus data directory layout
/prometheus/data/
01BKGTZQ1SYQJTR4YNT1SDQPRH/ # Block (2-hour window)
chunks/
000001 # Compressed sample data
index # Series + label index
meta.json # Block metadata
tombstones # Deleted series records
01BKGTZQ1SYQJTR4YNT1SDQPRI/ # Another block
wal/ # Write-Ahead Log (recent data)
Retention Configuration
# prometheus.yml — storage flags
# Set via command-line argument:
# --storage.tsdb.retention.time=90d
# --storage.tsdb.retention.size=50GB
# Whichever limit is hit first triggers deletion
Grafana Tempo — Trace Storage
Tempo is designed for high-volume trace ingest at low cost. It writes traces directly to object storage (S3/GCS/Azure Blob) in Parquet format. Unlike Jaeger or Zipkin, Tempo does not use a separate index database — trace lookup by ID is achieved via bloom filters, and attribute-based search uses Tempo's native search index.
Tempo Storage Tiers
WAL (Write-Ahead Log)
Newest traces — in-memory + local disk. Millisecond write latency. Not yet searchable by attribute.
Block Storage
Flushed every 5 minutes to object storage. Searchable. Most trace queries hit this tier.
Compacted Blocks
Older blocks are compacted for storage efficiency. Bloom filters enable fast TraceId lookup without full scan.
Tempo's local storage mode is suitable for development only. For production, configure an S3-compatible bucket. The tempo-config.yaml storage.trace.backend setting controls this. Tempo can handle thousands of traces per second with object storage backends on commodity hardware.
Data Persistence and Backup
Each storage component has a different backup approach:
| Component | Backup Method | Restore Approach |
|---|---|---|
| Loki | S3 bucket versioning + cross-region replication | Point Loki at restored S3 bucket |
| Prometheus | POST /api/v1/admin/tsdb/snapshot API | Copy snapshot to data dir, restart |
| Tempo | S3 bucket versioning (traces already in object storage) | Point Tempo at restored S3 bucket |