BizFirst Observe
Backup Strategies
When Loki and Tempo use S3 as the primary storage backend, the data itself is protected by S3's 11-nines durability — no additional backup of the log/trace data is needed. What does need backing up is Grafana's configuration database (dashboards, alert rules, users) and Prometheus configuration files.
What to Back Up
| Component | What to Back Up | How Often | Method |
|---|---|---|---|
| Grafana | SQLite/PostgreSQL database (dashboards, users, alert rules, API keys) | Daily | Database snapshot or filesystem backup |
| Prometheus Config | prometheus.yml, alert rules YAML, recording rules YAML | On every change | Git version control |
| Loki Config | loki-config.yaml, runtime config | On every change | Git version control |
| OTel Collector Config | otel-collector-config.yaml | On every change | Git version control |
| Log Data (Loki) | Already in S3 — no additional backup needed | - | S3 versioning (optional) |
| Trace Data (Tempo) | Already in S3 — no additional backup needed | - | - |
Grafana Database Backup
# Grafana uses SQLite by default (grafana.db file).
# Back it up daily while Grafana is stopped, or use the Grafana API for live export.
# Method 1: Filesystem backup (requires brief Grafana stop)
docker compose stop grafana
cp -r data/grafana/grafana.db backups/grafana-$(date +%Y%m%d).db
docker compose start grafana
# Method 2: Grafana API export (no downtime)
# Export all dashboards:
for DASH_UID in $(curl -s http://admin:admin@localhost:3000/api/search | jq -r '.[].uid'); do
curl -s http://admin:admin@localhost:3000/api/dashboards/uid/$DASH_UID \
> backups/dashboards/$DASH_UID.json
done
# Export alert rules:
curl -s http://admin:admin@localhost:3000/api/ruler/grafana/api/v1/rules \
> backups/alert-rules-$(date +%Y%m%d).yaml
# Method 3: Use PostgreSQL as Grafana database (recommended for production)
# PostgreSQL has native backup tools (pg_dump) and supports replication.
# Configure in grafana.ini:
[database]
type = postgres
host = postgres:5432
name = grafana
user = grafana
password = ${GRAFANA_DB_PASSWORD}
Prometheus Data Backup (TSDB Snapshot)
# Prometheus supports creating a TSDB snapshot via its API:
# (Requires --web.enable-lifecycle or --web.enable-admin-api flag)
# Create a snapshot:
curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot
# Response: {"status":"success","data":{"name":"20250525T120000Z-abc123"}}
# The snapshot is stored at:
ls /var/lib/prometheus/snapshots/20250525T120000Z-abc123/
# Copy the snapshot to backup storage:
aws s3 sync /var/lib/prometheus/snapshots/20250525T120000Z-abc123/ \
s3://bizfirst-backups/prometheus/$(date +%Y%m%d)/
# Restore from snapshot:
# 1. Stop Prometheus
# 2. Replace /var/lib/prometheus/data/ with the snapshot contents
# 3. Start Prometheus
Configuration Backup via Git
# All configuration files should be in your BizFirstGO infrastructure repository:
bizfirstgo/infrastructure/observe/
├── docker-compose.yml
├── otel-collector-config.yaml
├── loki-config.yaml
├── prometheus.yml
├── tempo-config.yaml
├── alertmanager.yml
├── grafana-provisioning/
│ ├── datasources/
│ ├── dashboards/
│ └── alerting/
└── dashboards/
├── flow-studio-overview.json
└── ... (all 10 dashboard JSON files)
# This git repository IS the backup for all configuration.
# Restoring from scratch: git clone + docker compose up = full stack restored.
# Never manage configuration outside of git.
S3 Versioning for Loki (Optional)
Enable S3 versioning on the Loki bucket to protect against accidental deletion of chunks. With versioning, deleting an object creates a delete marker — the original object is still recoverable. The cost overhead is minimal and the protection is high. Enable via: aws s3api put-bucket-versioning --bucket bizfirst-loki-logs --versioning-configuration Status=Enabled