Portal Community

What to Audit in an Observability System

Event TypeWhy It MattersWhere Logged
Grafana loginDetect unauthorized access attemptsGrafana audit log / server log
Dashboard viewedTrack which users viewed sensitive dashboardsGrafana audit log (Enterprise)
Explore query executedDetect ad-hoc queries that bypass dashboard controlsGrafana audit log (Enterprise)
Alert rule created/modifiedPrevent unauthorized alert suppressionGrafana audit log / git history
Silence createdTrack who silenced which alerts and whenGrafana alerting UI + audit log
Data source modifiedDetect unauthorized data source changes (e.g., pointing to wrong Loki)Grafana audit log
Loki delete API calledTrack all log deletion requests (GDPR compliance)Loki server logs

Grafana Server Logs (OSS)

# Grafana OSS logs basic events to its server log:
# docker compose logs grafana

# Key log patterns to monitor:
# Login events:
grep "Successful Login" grafana.log
# Format: {"level":"info","msg":"Successful Login","User":"admin@localhost","IP":"192.168.1.100"}

# Failed login attempts (potential brute force):
grep "Login Failed" grafana.log | wc -l

# Alert rule changes:
grep "alert rule" grafana.log

# Forward Grafana logs to Loki for searchability:
# (The OTel Collector can scrape Grafana's log file or receive via Syslog)

Grafana Audit Log (Enterprise)

# grafana.ini — enable audit logging (Grafana Enterprise)
[auditing]
enabled = true
loggers = loki      # Send audit events to Loki for searchability

[auditing.loki]
url = http://loki:3100
basicAuthUser = ""
basicAuthPassword = ""

# Audit events captured with Enterprise auditing:
# - HTTP request audit (all API calls with user, action, resource)
# - Query audit (every Explore query with the full LogQL/PromQL/TraceQL)
# - Dashboard view audit
# - Admin action audit

# Query audit log in Loki:
{job="grafana-audit"} | json | action = "query" | user_email != ""

Monitoring the OTel Collector Pipeline

# The OTel Collector exposes its own metrics — monitor for pipeline health:
# In Prometheus, scrape the collector's metrics endpoint (port 8888):
scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8888']

# Key metrics to alert on:
# Drops — telemetry being dropped before reaching backends:
otelcol_processor_dropped_spans_total > 0
otelcol_exporter_send_failed_spans_total > 0

# Alert rule for collector drops:
alert: OtelCollectorDropping
expr: increase(otelcol_processor_dropped_spans_total[5m]) > 0
for: 1m
annotations:
  summary: "OTel Collector is dropping telemetry — check exporter connectivity"
Treat the Audit Log as Critical Infrastructure

The Grafana audit log and Loki access logs must be stored separately from the operational logs they are auditing — ideally in a write-once, tamper-evident storage. If an attacker compromises the observability system, they should not be able to erase evidence of their access from the audit log. Consider routing audit logs to a separate, append-only S3 bucket with object lock enabled.