Deploying Prometheus
Prometheus is a single stateful binary that manages its own embedded TSDB. Deploying it correctly — with appropriate retention settings, disk provisioning, and remote write configuration — is critical for reliable metrics collection.
Prometheus Command-Line Configuration
# Docker Compose — Prometheus service definition
prometheus:
image: prom/prometheus:v2.51.0
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=90d" # Keep 90 days of metrics
- "--storage.tsdb.retention.size=100GB" # Cap storage at 100GB
- "--web.enable-remote-write-receiver" # Accept remote write (for OTel Collector)
- "--web.enable-lifecycle" # Enable /-/reload endpoint
- "--web.enable-admin-api" # Enable snapshot API
- "--rules.alert.resend-delay=1m"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alert-rules.yml:/etc/prometheus/alert-rules.yml
- prometheus-data:/prometheus
ports:
- "9090:9090"
prometheus.yml — Main Configuration
# prometheus.yml
global:
scrape_interval: 15s # Default scrape frequency
evaluation_interval: 15s # Alert rule evaluation frequency
scrape_timeout: 10s
# Labels added to all metrics scraped by this Prometheus instance
external_labels:
cluster: 'bizfirst-prod-us-east-1'
environment: 'production'
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# Alert and recording rules files
rule_files:
- "/etc/prometheus/alert-rules.yml"
- "/etc/prometheus/recording-rules.yml"
# Scrape configurations — see 02-scrape-config.html for full details
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8888']
- job_name: 'bizfirst-processengine'
static_configs:
- targets: ['processengine:8080']
metrics_path: '/metrics'
scrape_interval: 15s
Storage Sizing
Prometheus TSDB requires SSD storage — it performs many small random reads and writes. Rotational disk (HDD) will cause severe performance degradation.
| Scenario | Active Series | Retention | Disk Required | RAM Required |
|---|---|---|---|---|
| Development | ~5,000 | 15 days | 10 GB SSD | 512 MB |
| Small production | ~50,000 | 90 days | 50 GB SSD | 4 GB |
| Medium production | ~500,000 | 90 days | 500 GB SSD | 16 GB |
| Large production | 5M+ | 90 days | Use Thanos + object storage | 64 GB+ |
Remote Write — Sending Metrics to Prometheus
In addition to scraping, Prometheus can receive metrics via remote write — which is how the OTel Collector pushes metrics it receives via OTLP:
# otel-collector-config.yaml — remote write to Prometheus
exporters:
prometheusremotewrite:
endpoint: "http://prometheus:9090/api/v1/write"
tls:
insecure: true
headers:
X-Prometheus-Remote-Write-Version: "0.1.0"
resource_to_telemetry_conversion:
enabled: true # Convert OTel resource attributes to Prometheus labels
The default Prometheus deployment is a single instance with no built-in replication. If Prometheus goes down, metric collection stops (though services continue serving traffic). For HA deployments, run two Prometheus instances scraping the same targets, and use Thanos Querier to deduplicate the results. See Guide11: Enterprise Options.