Data Retention Overview — Data Retention & Archive

Recommended Default Retention

Signal	Storage Backend	Hot (Local) Retention	Cold (Object Store) Retention	Primary Use After Hot Period
Logs	Loki	30 days	1 year (S3 Glacier)	Compliance audit, post-incident investigation
Metrics	Prometheus / Thanos	90 days	Indefinite (Thanos Object Store)	Capacity planning, trend analysis, SLA reporting
Traces	Tempo	7 days	Not recommended (high volume)	Traces older than 7 days rarely needed — sample and discard

Why Different Retention Periods?

Traces: Short Retention

Traces are used for active incident debugging — usually within hours of an issue. Storing all spans for more than 7 days generates enormous storage costs with very little value. Use tail sampling to keep 100% of error traces and 5-10% of success traces.

Logs: Medium Retention

Logs are the primary audit trail for what happened in a workflow. 30 days covers most post-incident investigations. Cold storage (S3 Glacier) for 1 year covers compliance requirements without hot storage costs.

Metrics: Long Retention

Metrics are compact (a few KB per time series per day). Keeping metrics for months or years enables capacity planning — "at current growth rate, when will we need more servers?" This is not possible with short metric retention.

Compliance Drives Minimums

Audit requirements may mandate minimum retention periods. For financial workflows: SOX requires 7 years for audit logs; GDPR requires the ability to delete within 30 days. Configure retention to satisfy both requirements simultaneously.

Storage Cost Estimates

Signal	10 tenants, moderate load	Monthly S3 cost (us-east-1)
Logs (30-day hot, Loki)	~50 GB/month	~$1.15/month (S3 Standard)
Logs (cold, S3 Glacier)	~600 GB/year	~$2.40/year (S3 Glacier)
Metrics (Prometheus TSDB, 90 days)	~10 GB	Negligible (local disk)
Metrics (Thanos, 2-year history)	~80 GB	~$1.84/month (S3 Standard-IA)
Traces (Tempo, 7-day, 10% sampled)	~15 GB	~$0.35/month (S3 Standard)

Adjust Retention Based on Your Load

The estimates above assume ~100 workflow executions/hour across 10 tenants. High-volume deployments (10,000+ executions/hour) generate 100x more telemetry data. Always measure your actual log byte rate before setting retention periods — use the Loki metric sum(rate(loki_distributor_bytes_received_total[1h])).

← Setup Guide Next: Loki Log Retention →