Deploying Prometheus — Prometheus Metrics

Prometheus Command-Line Configuration

# Docker Compose — Prometheus service definition
prometheus:
  image: prom/prometheus:v2.51.0
  command:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/prometheus"
    - "--storage.tsdb.retention.time=90d"         # Keep 90 days of metrics
    - "--storage.tsdb.retention.size=100GB"        # Cap storage at 100GB
    - "--web.enable-remote-write-receiver"          # Accept remote write (for OTel Collector)
    - "--web.enable-lifecycle"                      # Enable /-/reload endpoint
    - "--web.enable-admin-api"                      # Enable snapshot API
    - "--rules.alert.resend-delay=1m"
  volumes:
    - ./prometheus.yml:/etc/prometheus/prometheus.yml
    - ./alert-rules.yml:/etc/prometheus/alert-rules.yml
    - prometheus-data:/prometheus
  ports:
    - "9090:9090"

prometheus.yml — Main Configuration

# prometheus.yml
global:
  scrape_interval: 15s        # Default scrape frequency
  evaluation_interval: 15s    # Alert rule evaluation frequency
  scrape_timeout: 10s

  # Labels added to all metrics scraped by this Prometheus instance
  external_labels:
    cluster: 'bizfirst-prod-us-east-1'
    environment: 'production'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

# Alert and recording rules files
rule_files:
  - "/etc/prometheus/alert-rules.yml"
  - "/etc/prometheus/recording-rules.yml"

# Scrape configurations — see 02-scrape-config.html for full details
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8888']

  - job_name: 'bizfirst-processengine'
    static_configs:
      - targets: ['processengine:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s

Storage Sizing

Prometheus TSDB requires SSD storage — it performs many small random reads and writes. Rotational disk (HDD) will cause severe performance degradation.

Scenario	Active Series	Retention	Disk Required	RAM Required
Development	~5,000	15 days	10 GB SSD	512 MB
Small production	~50,000	90 days	50 GB SSD	4 GB
Medium production	~500,000	90 days	500 GB SSD	16 GB
Large production	5M+	90 days	Use Thanos + object storage	64 GB+

Remote Write — Sending Metrics to Prometheus

In addition to scraping, Prometheus can receive metrics via remote write — which is how the OTel Collector pushes metrics it receives via OTLP:

# otel-collector-config.yaml — remote write to Prometheus
exporters:
  prometheusremotewrite:
    endpoint: "http://prometheus:9090/api/v1/write"
    tls:
      insecure: true
    headers:
      X-Prometheus-Remote-Write-Version: "0.1.0"
    resource_to_telemetry_conversion:
      enabled: true  # Convert OTel resource attributes to Prometheus labels

Prometheus is Single-Node by Default

The default Prometheus deployment is a single instance with no built-in replication. If Prometheus goes down, metric collection stops (though services continue serving traffic). For HA deployments, run two Prometheus instances scraping the same targets, and use Thanos Querier to deduplicate the results. See Guide11: Enterprise Options.

← Prometheus Overview Next: Scrape Configuration →