Monitoring Timeouts
Every HIL timeout fires an HILExpiredEvent on the workflow event bus. The Observer Panel surfaces these events in real time. Audit log entries capture the full timeout record for compliance and retrospective analysis.
HILExpiredEvent
Regardless of the configured timeout behavior (Escalate, AutoApprove, AutoReject, or Fail), the dispatcher always publishes an HILExpiredEvent after handling the timeout:
public class HILExpiredEvent : IWorkflowEvent
{
public string ExecutionResId { get; init; }
public Guid ExecutionId { get; init; }
public string TenantId { get; init; }
public string NodeId { get; init; }
public string TimeoutBehavior { get; init; } // Escalate | AutoApprove | AutoReject | Fail
public DateTimeOffset ExpiredAt { get; init; }
public string OriginalActorId { get; init; }
public string? EscalationActorId { get; init; } // set only for Escalate
}
// Published in HILTimeoutDispatcher after every behavior handler runs
await _eventBus.PublishAsync(new HILExpiredEvent { ... }, ct);
Observer Panel — Timeout Events
The flowObserverPanelStore subscribes to HIL events streamed via SignalR and renders them in the Observer Panel timeline:
// flowObserverPanelStore.ts
connection.on("HILExpired", (event: HILExpiredEvent) => {
useFlowObserverPanelStore.getState().addEvent({
type : "hil-timeout",
nodeId : event.nodeId,
timestamp : event.expiredAt,
label : `HIL timeout — ${event.timeoutBehavior}`,
actor : event.originalActorId,
severity : event.timeoutBehavior === "Fail" ? "error" : "warning"
});
});
Audit Log Entry Schema
Timeout events are written to Process_AuditLog alongside the suspension record:
// AuditLog entry written by HILTimeoutDispatcher
{
"auditId" : "7a3e2b1c-...",
"tenantId" : "tenant-001",
"executionId" : "exec-abc",
"executionResId" : "wf-run-xyz",
"nodeId" : "node-approval-1",
"eventType" : "HILTimeout",
"behavior" : "Escalate",
"originalActor" : "user-john",
"escalationActor": "user-manager",
"expiredAt" : "2026-05-25T18:00:00Z",
"recordedAt" : "2026-05-25T18:00:01Z"
}
Suspended Executions — Monitoring Query
Operations teams can query the suspended executions table to surface overdue and at-risk tasks before timeout fires:
-- Find tasks expiring within the next hour (at-risk)
SELECT ExecutionResId, SuspendedNodeId, ActorId, ExpiresAt,
DATEDIFF(MINUTE, GETUTCDATE(), ExpiresAt) AS MinutesRemaining
FROM Process_SuspendedExecutions
WHERE Status = 0 -- Pending
AND ExpiresAt BETWEEN GETUTCDATE() AND DATEADD(HOUR, 1, GETUTCDATE())
ORDER BY ExpiresAt ASC;
-- Find tasks that have already timed out but not yet processed
SELECT ExecutionResId, SuspendedNodeId, ActorId, ExpiresAt,
TimeoutBehavior
FROM Process_SuspendedExecutions
WHERE Status = 0 -- still Pending
AND ExpiresAt < GETUTCDATE()
ORDER BY ExpiresAt ASC;
Timeout Metrics
The node observability layer emits timeout counters via INodeMetrics:
// Recorded in HILTimeoutDispatcher
_metrics.IncrementCounter(
"hil.timeout.total",
tags: new Dictionary<string, string>
{
["behavior"] = suspension.TimeoutBehavior,
["nodeId"] = suspension.SuspendedNodeId,
["tenantId"] = suspension.TenantId
});
// Key metrics to alert on:
// hil.timeout.total{behavior="Fail"} — terminal failures
// hil.timeout.total{behavior="Escalate"} — escalation volume
// hil.timeout.job.batch_size — job processing throughput
// hil.timeout.job.duration_ms — job execution latency
Alerting Recommendations
| Metric / Condition | Alert Threshold | Action |
|---|---|---|
| hil.timeout.total{behavior="Fail"} rate | > 5/hour | Ops review — workflows terminating unexpectedly |
| Pending suspensions past expiry | > 0 for > 15 min | Check HILTimeoutJob is running (Hangfire) |
| hil.timeout.job.duration_ms | > 10 s | BatchSize may be too large; reduce in config |
| HILExpiredEvent rate spike | 3x baseline | Review actor availability and deadline settings |
Hangfire Job Health
The HILTimeoutJob runs on a schedule via Hangfire. Verify it is running from the Hangfire dashboard:
// Hangfire registration — typical recurrence
RecurringJob.AddOrUpdate<HILTimeoutJob>(
"hil-timeout-scan",
job => job.ExecuteAsync(CancellationToken.None),
Cron.Minutely());
// If the job is missing from Hangfire recurring jobs list,
// re-register by restarting the API or calling the admin endpoint:
// POST /admin/jobs/hil-timeout/register