BizFirst Observe
Observability of the Observability System
The observability system itself needs oversight — who is querying which logs, who changed an alert rule, who downloaded a dashboard. Grafana's audit log (available in Grafana Enterprise) and Loki's access logs provide this layer of meta-observability.
What to Audit in an Observability System
| Event Type | Why It Matters | Where Logged |
|---|---|---|
| Grafana login | Detect unauthorized access attempts | Grafana audit log / server log |
| Dashboard viewed | Track which users viewed sensitive dashboards | Grafana audit log (Enterprise) |
| Explore query executed | Detect ad-hoc queries that bypass dashboard controls | Grafana audit log (Enterprise) |
| Alert rule created/modified | Prevent unauthorized alert suppression | Grafana audit log / git history |
| Silence created | Track who silenced which alerts and when | Grafana alerting UI + audit log |
| Data source modified | Detect unauthorized data source changes (e.g., pointing to wrong Loki) | Grafana audit log |
| Loki delete API called | Track all log deletion requests (GDPR compliance) | Loki server logs |
Grafana Server Logs (OSS)
# Grafana OSS logs basic events to its server log:
# docker compose logs grafana
# Key log patterns to monitor:
# Login events:
grep "Successful Login" grafana.log
# Format: {"level":"info","msg":"Successful Login","User":"admin@localhost","IP":"192.168.1.100"}
# Failed login attempts (potential brute force):
grep "Login Failed" grafana.log | wc -l
# Alert rule changes:
grep "alert rule" grafana.log
# Forward Grafana logs to Loki for searchability:
# (The OTel Collector can scrape Grafana's log file or receive via Syslog)
Grafana Audit Log (Enterprise)
# grafana.ini — enable audit logging (Grafana Enterprise)
[auditing]
enabled = true
loggers = loki # Send audit events to Loki for searchability
[auditing.loki]
url = http://loki:3100
basicAuthUser = ""
basicAuthPassword = ""
# Audit events captured with Enterprise auditing:
# - HTTP request audit (all API calls with user, action, resource)
# - Query audit (every Explore query with the full LogQL/PromQL/TraceQL)
# - Dashboard view audit
# - Admin action audit
# Query audit log in Loki:
{job="grafana-audit"} | json | action = "query" | user_email != ""
Monitoring the OTel Collector Pipeline
# The OTel Collector exposes its own metrics — monitor for pipeline health:
# In Prometheus, scrape the collector's metrics endpoint (port 8888):
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['otel-collector:8888']
# Key metrics to alert on:
# Drops — telemetry being dropped before reaching backends:
otelcol_processor_dropped_spans_total > 0
otelcol_exporter_send_failed_spans_total > 0
# Alert rule for collector drops:
alert: OtelCollectorDropping
expr: increase(otelcol_processor_dropped_spans_total[5m]) > 0
for: 1m
annotations:
summary: "OTel Collector is dropping telemetry — check exporter connectivity"
Treat the Audit Log as Critical Infrastructure
The Grafana audit log and Loki access logs must be stored separately from the operational logs they are auditing — ideally in a write-once, tamper-evident storage. If an attacker compromises the observability system, they should not be able to erase evidence of their access from the audit log. Consider routing audit logs to a separate, append-only S3 bucket with object lock enabled.