Portal Community

Recommended Default Retention

SignalStorage BackendHot (Local) RetentionCold (Object Store) RetentionPrimary Use After Hot Period
LogsLoki30 days1 year (S3 Glacier)Compliance audit, post-incident investigation
MetricsPrometheus / Thanos90 daysIndefinite (Thanos Object Store)Capacity planning, trend analysis, SLA reporting
TracesTempo7 daysNot recommended (high volume)Traces older than 7 days rarely needed — sample and discard

Why Different Retention Periods?

Traces: Short Retention

Traces are used for active incident debugging — usually within hours of an issue. Storing all spans for more than 7 days generates enormous storage costs with very little value. Use tail sampling to keep 100% of error traces and 5-10% of success traces.

Logs: Medium Retention

Logs are the primary audit trail for what happened in a workflow. 30 days covers most post-incident investigations. Cold storage (S3 Glacier) for 1 year covers compliance requirements without hot storage costs.

Metrics: Long Retention

Metrics are compact (a few KB per time series per day). Keeping metrics for months or years enables capacity planning — "at current growth rate, when will we need more servers?" This is not possible with short metric retention.

Compliance Drives Minimums

Audit requirements may mandate minimum retention periods. For financial workflows: SOX requires 7 years for audit logs; GDPR requires the ability to delete within 30 days. Configure retention to satisfy both requirements simultaneously.

Storage Cost Estimates

Signal10 tenants, moderate loadMonthly S3 cost (us-east-1)
Logs (30-day hot, Loki)~50 GB/month~$1.15/month (S3 Standard)
Logs (cold, S3 Glacier)~600 GB/year~$2.40/year (S3 Glacier)
Metrics (Prometheus TSDB, 90 days)~10 GBNegligible (local disk)
Metrics (Thanos, 2-year history)~80 GB~$1.84/month (S3 Standard-IA)
Traces (Tempo, 7-day, 10% sampled)~15 GB~$0.35/month (S3 Standard)
Adjust Retention Based on Your Load

The estimates above assume ~100 workflow executions/hour across 10 tenants. High-volume deployments (10,000+ executions/hour) generate 100x more telemetry data. Always measure your actual log byte rate before setting retention periods — use the Loki metric sum(rate(loki_distributor_bytes_received_total[1h])).