Portal Community

The HIL Analytics Dashboard

Open the HIL Analytics dashboard from Grafana Dashboards → BizFirstGO folder. Key panels:

PanelHealthy ValueAction If Unhealthy
Current Backlog (Gauge)Green: < 50 tasksYellow (>50): alert approver managers; Red (>100): escalate
Overdue Tasks (Stat)0 (zero)Any value > 0: immediate action — SLA breached
Approval Rate (Pie chart)> 80% approvedHigh rejection rate may indicate process issues
Backlog by Tenant (Bar chart)Even distributionSingle tenant with very high backlog: notify that tenant's admin
Suspension Duration (Histogram)Most tasks < 4 hoursLong tail (> 24h): identify and escalate overdue tasks

HIL Backlog Queries

# Current pending HIL task count:
sum(bizfirst_hil_pending_count)

# Pending tasks broken down by tenant:
sum by (tenant_id) (bizfirst_hil_pending_count)

# Overdue tasks (past SLA deadline):
sum(bizfirst_hil_overdue_count)

# Overdue tasks by tenant (to find which tenant has the worst SLA breach):
sum by (tenant_id) (bizfirst_hil_overdue_count) > 0

# HIL task approval rate over the last hour:
sum(rate(bizfirst_hil_completed_total{outcome="approved"}[1h]))
  /
sum(rate(bizfirst_hil_completed_total[1h]))
* 100

# Average time-to-completion for HIL tasks (last 24 hours):
histogram_quantile(0.50,
  sum(rate(bizfirst_hil_suspension_duration_seconds_bucket[24h])) by (le)
)

Finding Specific Overdue Tasks

# Use Loki to find which specific tasks are overdue:
{job="processengine"} | json | hilStatus = "overdue"
  | line_format "taskId={{.hilTaskId}} tenant={{.tenantId}} deadline={{.slaDeadline}}"

# Find HIL tasks for a specific workflow type that are overdue:
{job="processengine"} | json | hilStatus = "overdue" | workflowType = "expense-approval"

# Find tasks assigned to a specific role that are pending:
{job="processengine"} | json | hilStatus = "pending" | roleRequired = "FinanceManager"
  | line_format "taskId={{.hilTaskId}} pending since {{.suspendedAt}}"

Setting Up HIL Backlog Alerts

# The pre-built alert rules include two HIL alerts:

# 1. HILBacklogHigh — fires when backlog > 100 tasks for 15 minutes:
- alert: HILBacklogHigh
  expr: sum(bizfirst_hil_pending_count) > 100
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "HIL backlog is high ({{ $value }} pending tasks)"

# 2. HILSLABreached — fires immediately when any task is overdue:
- alert: HILSLABreached
  expr: sum(bizfirst_hil_overdue_count) > 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "{{ $value }} HIL tasks have breached their SLA deadline"
Overdue Tasks Need Immediate Human Action

A firing HILSLABreached alert cannot be resolved by the engineering team — it requires the business process owners to approve or escalate the overdue tasks. When this alert fires, the on-call engineer's job is to: (1) identify which tenant and which tasks are overdue (use the queries above), (2) notify the appropriate business team or process owner, and (3) document the breach for compliance reporting.