Debugging Stuck Interactions
A "stuck" interaction is one that has been published and is pending, but the user has not responded and the timeout has not yet expired — often because delivery failed or the interaction is not rendering. This page explains how to find stuck interactions and resolve them.
Diagnosing: Is an Interaction Stuck?
An interaction is stuck (rather than merely slow) if all of these are true:
- It has been in-flight significantly longer than the P95 response time for its type
- The
displayedAtfield in the audit log is null (the UI never confirmed rendering) - The target user is actively online (EdgeStream shows their session as connected)
Step-by-Step Debugging Process
Check InteractionMonitor for in-flight interactions
Open the admin dashboard and review InteractionMonitor. Look for interactions with age significantly greater than expected. Note the interactionId and targetUserId.
Query the audit log for the interaction
Use the audit store to fetch the full entry. Check whether deliveredAt and displayedAt are populated. If deliveredAt is null, delivery failed.
Check EdgeStream session state for the target user
Query the EdgeStream session store for the target user. If they are offline, the interaction will be delivered when they reconnect (unless the timeout expires first).
Check the client-side queue
If the user is connected, the interaction should appear in their useInteractionReceiver() queue. Ask the user to open the WorkDesk inbox or check for rendering errors in browser console.
Manually resolve if needed
If the workflow is blocked and the interaction cannot be delivered, use the admin resolution API to manually respond to or cancel the interaction.
Audit Log Query for Stuck Interactions
// Find interactions that have been in-flight for more than 2x their type's expected P95
public async Task<List<InteractionAuditEntry>> FindStuckInteractionsAsync(
TimeSpan stuckThreshold,
CancellationToken ct = default)
{
return await _auditStore.QueryAsync(new AuditQuery
{
Statuses = [InteractionStatus.Pending],
CreatedBefore = DateTime.UtcNow - stuckThreshold,
// No displayedAt — never rendered by client
DisplayedAt = null
}, ct);
}
// Example: find approvals stuck for more than 8 hours with no display
var stuck = await FindStuckInteractionsAsync(TimeSpan.FromHours(8), ct);
foreach (var entry in stuck)
{
_logger.LogWarning(
"Stuck interaction {Id} for user {User} — published {Age} ago, never displayed",
entry.InteractionId,
entry.TargetUserId,
DateTime.UtcNow - entry.PublishedAt);
}
Manual Resolution API
Administrators can manually resolve a stuck interaction via the admin endpoint. This triggers the response pipeline as if the user had responded:
// POST /edge-interact/admin/resolve
[HttpPost("admin/resolve")]
[Authorize(Roles = "admin")]
public async Task<IActionResult> ResolveAsync(
[FromBody] AdminResolveRequest req,
CancellationToken ct)
{
// Validate the interaction is still in pending state
var entry = await _auditStore.GetByIdAsync(req.InteractionId, ct);
if (entry is null)
return NotFound();
if (entry.Status != InteractionStatus.Pending)
return Conflict($"Interaction is already in state: {entry.Status}");
var response = new InteractionResponse
{
InteractionId = req.InteractionId,
RespondedBy = $"admin:{User.Identity!.Name}",
Outcome = req.Outcome,
Data = req.Data,
Timestamp = DateTime.UtcNow,
AdminOverride = true
};
await _pipeline.SubmitResponseAsync(response, ct);
return Ok(new { resolved = true, outcome = req.Outcome });
}
Manual Cancellation (Force Timeout)
To cancel an interaction without submitting a response (equivalent to forcing a timeout):
// POST /edge-interact/admin/cancel
[HttpPost("admin/cancel")]
[Authorize(Roles = "admin")]
public async Task<IActionResult> CancelAsync(
[FromBody] AdminCancelRequest req,
CancellationToken ct)
{
await _pipeline.CancelAsync(req.InteractionId, reason: req.Reason, ct);
return Ok();
}
Common Root Causes and Fixes
| Symptom | Root Cause | Fix |
|---|---|---|
deliveredAt is null, user offline | User is not connected to EdgeStream | Wait for reconnect, or escalate via email/notification out-of-band |
deliveredAt set, displayedAt null | Client received the interaction but did not render it (renderer missing or error) | Check InteractionContainer renderer map for the interaction type; check browser errors |
| Interaction delivered and displayed, no response | User is ignoring or cannot find the interaction | Send a reminder notification; check inbox UI priority ordering |
| All interactions for one type stuck | Renderer component for that type is crashing (React error boundary) | Check browser console; fix the component; redeploy |
| Callback topic not receiving response | Topic routing misconfigured in EdgeStream | Verify interactions.callback.{id} topic subscription is active on the server |
Grafana Query: Interactions Pending Over Threshold
// Count interactions in-flight for more than 1 hour
# Note: This requires the pipeline to tag in-flight metrics with publish timestamp
# Use the audit log query endpoint for fine-grained stuck detection instead.
# Coarse signal — growing in-flight gauge suggests accumulating stuck interactions:
interaction_in_flight > 50