Configure Alerts — Setup Guide

Step 1: Add a Contact Point

Navigate to Alerting → Contact points → Add contact point in Grafana. The two most common options for BizFirst deployments:

Option A: Slack

# 1. Create a Slack App and Incoming Webhook URL:
#    https://api.slack.com/apps → Create App → Incoming Webhooks → Activate → Add Webhook

# 2. In Grafana: Alerting → Contact points → Add contact point
#    Name: slack-platform
#    Type: Slack
#    Webhook URL: https://hooks.slack.com/services/T.../B.../...
#    Channel: #platform-alerts
#    Title: {{ if eq .Status "firing" }}ALERT{{ else }}RESOLVED{{ end }}: {{ .CommonLabels.alertname }}
#    Message:
#    {{ range .Alerts }}
#    Summary: {{ .Annotations.summary }}
#    Severity: {{ .Labels.severity }}
#    {{ end }}

# 3. Click "Test" to send a test notification to Slack
# 4. Click "Save contact point"

Option B: Email

# Configure SMTP in grafana.ini first:
[smtp]
enabled = true
host = smtp.bizfirstai.com:587
user = grafana-alerts@bizfirstai.com
password = ${SMTP_PASSWORD}
from_address = grafana-alerts@bizfirstai.com
from_name = BizFirst Observe

# Then in Grafana: Alerting → Contact points → Add contact point
#    Name: email-platform
#    Type: Email
#    Addresses: platform@bizfirstai.com; oncall@bizfirstai.com

Step 2: Import Alert Rules

BizFirst Observe ships with pre-built alert rules as a Grafana provisioning YAML. Import them via the provisioning directory or via the Grafana API:

# The alert rules file is at:
# grafana-provisioning/alerting/alert-rules.yaml

# If provisioning is configured (recommended):
# Copy alert-rules.yaml to the grafana provisioning alerting directory
# Grafana auto-loads it within 30 seconds

# Or import via Grafana API:
curl -X POST http://admin:admin@localhost:3000/api/ruler/grafana/api/v1/rules/BizFirst \
  -H "Content-Type: application/yaml" \
  --data-binary @alert-rules.yaml

Pre-Built Alert Rules Included

Alert Name	Condition	Severity	For Duration
WorkflowErrorRateHigh	Error rate > 5% for 5 minutes	critical	5m
WorkflowErrorRateElevated	Error rate > 1% for 10 minutes	warning	10m
WorkflowP99LatencyHigh	P99 latency > 30 seconds	warning	5m
HILBacklogHigh	HIL pending count > 100 tasks	warning	15m
HILSLABreached	HIL overdue count > 0	critical	1m
ProcessEngineDown	No metrics from processengine for 2 minutes	critical	2m

Step 3: Configure Notification Policy

# Set the default notification routing in Grafana:
# Alerting → Notification policies

# Default policy (catches all alerts not matched by specific routes):
Default receiver: email-platform
Group by: [alertname, team]
Group wait: 30s
Group interval: 5m
Repeat interval: 4h

# Add a specific route for critical alerts → PagerDuty (if configured):
# Click "Add nested policy"
Matcher: severity = critical
Receiver: pagerduty-critical
Continue: false  # Don't also send to default

Step 4: Test the Alert Pipeline

# Send a test alert to verify the full pipeline (contact point → notification):
# In Grafana: Alerting → Contact points → find your contact point → "Send test"
# Verify the test message arrives in Slack/email.

# Temporarily lower an alert threshold to trigger a real alert:
# Example: Lower WorkflowErrorRateHigh threshold from 5% to 0.001%
# Run a test execution → verify alert fires → verify Slack message
# Restore the original threshold immediately after testing.

# Check Grafana alert state:
curl -s http://admin:admin@localhost:3000/api/alertmanager/grafana/api/v2/alerts \
  | jq '.[].labels.alertname'

Configure At Least One Contact Point Before Production

Going to production without a working alert contact point means critical alerts (like ProcessEngineDown or HILSLABreached) fire silently in Grafana with no notification. Always test the full alert pipeline — including a real notification delivery — before the go-live date.

← Import Pre-Built Dashboards Next: Validation Checklist →