Timeout & Circuit Breaker Resilience
Enforce strict execution time limits and implement automatic circuit breaking so that slow or failing dependencies never cascade into full platform outages.
TimeoutGuard: Execution Time Enforcement
The Problem
An HTTP node calls a third-party analytics API. Under normal conditions it completes in 200ms. Under load, the API degrades — responses take 45 seconds. Without a timeout guard:
- Each workflow execution holds a thread for 45 seconds
- Database connections remain open during the wait
- 100 concurrent slow calls exhaust the thread pool
- The platform appears hung to all users
The Solution
TimeoutGuard records start time in Pre and checks elapsed time in Post. Configure action="block" to return a failure after timeoutMs milliseconds.
{
"name": "TimeoutGuard",
"enabled": true,
"config": {
"timeoutMs": 5000, // 5 seconds maximum
"action": "block" // fail with Blocked result if exceeded
}
}
CancellationToken in the node implementation itself. TimeoutGuard enforces the policy at the workflow level — it blocks the result from being accepted, but the node's async operation may have already completed.
Warning mode vs block mode
| Config | Behavior | Use when |
|---|---|---|
"action": "block" |
Returns Blocked — the workflow fails with an error |
Hard SLA requirements; output after timeout is unreliable |
"action": "warn" |
Returns Warning (IsAllowed=true) — workflow continues but violation is logged |
Monitoring and alerting; tolerant workflows where slow response is still usable |
CircuitBreakerGuard: Automatic Dependency Protection
The Problem
The analytics API goes fully down at 2:00 AM. Without circuit breaking:
- Every workflow execution waits the full 5s timeout before failing
- Logs fill with timeout errors — thousands per minute
- The failing API receives a thundering herd of retries, delaying recovery
- Audit and monitoring systems are overwhelmed by error volume
The Solution
CircuitBreakerGuard opens the circuit after 5 consecutive failures. Subsequent calls are rejected immediately without touching the failing API — fast fail with no resource waste.
{
"name": "CircuitBreakerGuard",
"enabled": true,
"config": {
"threshold": 5, // open after 5 consecutive permanent failures
"timeout": 60000 // stay Open for 60 seconds, then try HalfOpen
}
}
Circuit breaker state machine
Closed Normal
All requests pass through. Failures are tracked in a sliding 5-minute window. When FailureCount ≥ 5 → Open.
Open Blocking
All requests blocked immediately. After 60 seconds (OpenDuration) → HalfOpen.
HalfOpen Testing
One test request allowed. Success: → Closed (normal operation). Failure: → Open (60 more seconds).
The circuit breaker at the service level (in GuardCircuitBreaker) requires 3 consecutive successes to close from HalfOpen.
Service-level circuit breaker defaults
The GuardCircuitBreaker service (which protects guard infrastructure) uses these defaults, configured in DI:
| Policy | Default |
|---|---|
| FailureThreshold | 5 (within the failure window) |
| OpenDuration | 60 seconds |
| SuccessThresholdForHalfOpen | 3 consecutive successes |
| FailSecure | true (security-critical guards block when circuit open) |
| ResetOnTransient | true (transient failures don't count toward threshold) |
| FailureWindow | 300 seconds (5-minute sliding window) |
Combined Configuration: Timeout + Circuit Breaker
{
"guardRails": {
"individual": [
{
"name": "TimeoutGuard",
"enabled": true,
"order": 1,
"config": { "timeoutMs": 5000, "action": "block" }
},
{
"name": "CircuitBreakerGuard",
"enabled": true,
"order": 2,
"config": { "threshold": 5, "timeout": 60000 }
}
]
}
}
Result: Slow requests fail fast (5s max). After 5 failures, subsequent requests fail instantly (circuit open). After 60s, one test request probes recovery. On success, normal operation resumes.