Portal Community

What to Back Up

ComponentWhat to Back UpHow OftenMethod
GrafanaSQLite/PostgreSQL database (dashboards, users, alert rules, API keys)DailyDatabase snapshot or filesystem backup
Prometheus Configprometheus.yml, alert rules YAML, recording rules YAMLOn every changeGit version control
Loki Configloki-config.yaml, runtime configOn every changeGit version control
OTel Collector Configotel-collector-config.yamlOn every changeGit version control
Log Data (Loki)Already in S3 — no additional backup needed-S3 versioning (optional)
Trace Data (Tempo)Already in S3 — no additional backup needed--

Grafana Database Backup

# Grafana uses SQLite by default (grafana.db file).
# Back it up daily while Grafana is stopped, or use the Grafana API for live export.

# Method 1: Filesystem backup (requires brief Grafana stop)
docker compose stop grafana
cp -r data/grafana/grafana.db backups/grafana-$(date +%Y%m%d).db
docker compose start grafana

# Method 2: Grafana API export (no downtime)
# Export all dashboards:
for DASH_UID in $(curl -s http://admin:admin@localhost:3000/api/search | jq -r '.[].uid'); do
  curl -s http://admin:admin@localhost:3000/api/dashboards/uid/$DASH_UID \
    > backups/dashboards/$DASH_UID.json
done

# Export alert rules:
curl -s http://admin:admin@localhost:3000/api/ruler/grafana/api/v1/rules \
  > backups/alert-rules-$(date +%Y%m%d).yaml

# Method 3: Use PostgreSQL as Grafana database (recommended for production)
# PostgreSQL has native backup tools (pg_dump) and supports replication.
# Configure in grafana.ini:
[database]
type = postgres
host = postgres:5432
name = grafana
user = grafana
password = ${GRAFANA_DB_PASSWORD}

Prometheus Data Backup (TSDB Snapshot)

# Prometheus supports creating a TSDB snapshot via its API:
# (Requires --web.enable-lifecycle or --web.enable-admin-api flag)

# Create a snapshot:
curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot
# Response: {"status":"success","data":{"name":"20250525T120000Z-abc123"}}

# The snapshot is stored at:
ls /var/lib/prometheus/snapshots/20250525T120000Z-abc123/

# Copy the snapshot to backup storage:
aws s3 sync /var/lib/prometheus/snapshots/20250525T120000Z-abc123/ \
  s3://bizfirst-backups/prometheus/$(date +%Y%m%d)/

# Restore from snapshot:
# 1. Stop Prometheus
# 2. Replace /var/lib/prometheus/data/ with the snapshot contents
# 3. Start Prometheus

Configuration Backup via Git

# All configuration files should be in your BizFirstGO infrastructure repository:
bizfirstgo/infrastructure/observe/
  ├── docker-compose.yml
  ├── otel-collector-config.yaml
  ├── loki-config.yaml
  ├── prometheus.yml
  ├── tempo-config.yaml
  ├── alertmanager.yml
  ├── grafana-provisioning/
  │   ├── datasources/
  │   ├── dashboards/
  │   └── alerting/
  └── dashboards/
      ├── flow-studio-overview.json
      └── ... (all 10 dashboard JSON files)

# This git repository IS the backup for all configuration.
# Restoring from scratch: git clone + docker compose up = full stack restored.
# Never manage configuration outside of git.
S3 Versioning for Loki (Optional)

Enable S3 versioning on the Loki bucket to protect against accidental deletion of chunks. With versioning, deleting an object creates a delete marker — the original object is still recoverable. The cost overhead is minimal and the protection is high. Enable via: aws s3api put-bucket-versioning --bucket bizfirst-loki-logs --versioning-configuration Status=Enabled