Tempo HA — Enterprise Options

Tempo Distributed Components

Component	Role	Replicas for HA
Distributor	Receives OTLP traces; routes to ingesters via consistent hash ring	2+
Ingester	Buffers spans in WAL; flushes blocks to S3	3 (replication factor 3)
Querier	Executes TraceQL queries against S3 and ingester cache	2+
Query Frontend	Caches and shards queries	2
Compactor	Merges blocks; enforces retention	1 (singleton)

Tempo HA Helm Configuration

# tempo-distributed-values.yaml
tempo-distributed:
  ingester:
    replicas: 3
    config:
      replication_factor: 3     # Write to 3 ingesters simultaneously
    persistence:
      enabled: true
      size: 10Gi

  distributor:
    replicas: 2

  querier:
    replicas: 2

  compactor:
    replicas: 1

  storage:
    trace:
      backend: s3
      s3:
        bucket: bizfirst-tempo-traces
        endpoint: s3.amazonaws.com
        region: us-east-1
        access_key: ${AWS_ACCESS_KEY}
        secret_key: ${AWS_SECRET_KEY}

# Install:
helm install tempo grafana/tempo-distributed \
  --namespace observe \
  --values tempo-distributed-values.yaml

Is Tempo HA Worth the Complexity?

Tempo HA adds significant operational complexity. Consider the trade-offs before upgrading:

Factor	Single-Node Tempo	Tempo HA
Write HA	No — single point of failure	Yes — 3 replicas, replication factor 3
Data durability	S3 backend: 11-nines durability	Same — S3 backend
Trace loss on ingester failure	WAL data in memory is lost	Other ingesters have the data (replication factor)
Operational complexity	Low — single process	High — 5 component types, ring coordination
Resource cost	~4 CPU, ~8 GB RAM	~16 CPU, ~32 GB RAM

Trace Loss During Single-Node Downtime Is Acceptable

For most BizFirst deployments, losing a few minutes of traces during a Tempo restart is acceptable — traces are a debugging aid, not a system of record. The WAL ensures that buffered traces are persisted locally before an orderly shutdown. Consider Tempo HA only if you have a hard requirement for zero trace loss and cannot tolerate brief ingestion gaps during Kubernetes rolling updates.

← Thanos for Metrics HA Next: Grafana Enterprise →