Portal Community

Diagnosing: Is an Interaction Stuck?

An interaction is stuck (rather than merely slow) if all of these are true:

Step-by-Step Debugging Process

1

Check InteractionMonitor for in-flight interactions

Open the admin dashboard and review InteractionMonitor. Look for interactions with age significantly greater than expected. Note the interactionId and targetUserId.

2

Query the audit log for the interaction

Use the audit store to fetch the full entry. Check whether deliveredAt and displayedAt are populated. If deliveredAt is null, delivery failed.

3

Check EdgeStream session state for the target user

Query the EdgeStream session store for the target user. If they are offline, the interaction will be delivered when they reconnect (unless the timeout expires first).

4

Check the client-side queue

If the user is connected, the interaction should appear in their useInteractionReceiver() queue. Ask the user to open the WorkDesk inbox or check for rendering errors in browser console.

5

Manually resolve if needed

If the workflow is blocked and the interaction cannot be delivered, use the admin resolution API to manually respond to or cancel the interaction.

Audit Log Query for Stuck Interactions

// Find interactions that have been in-flight for more than 2x their type's expected P95
public async Task<List<InteractionAuditEntry>> FindStuckInteractionsAsync(
    TimeSpan stuckThreshold,
    CancellationToken ct = default)
{
    return await _auditStore.QueryAsync(new AuditQuery
    {
        Statuses       = [InteractionStatus.Pending],
        CreatedBefore  = DateTime.UtcNow - stuckThreshold,
        // No displayedAt — never rendered by client
        DisplayedAt    = null
    }, ct);
}

// Example: find approvals stuck for more than 8 hours with no display
var stuck = await FindStuckInteractionsAsync(TimeSpan.FromHours(8), ct);
foreach (var entry in stuck)
{
    _logger.LogWarning(
        "Stuck interaction {Id} for user {User} — published {Age} ago, never displayed",
        entry.InteractionId,
        entry.TargetUserId,
        DateTime.UtcNow - entry.PublishedAt);
}

Manual Resolution API

Administrators can manually resolve a stuck interaction via the admin endpoint. This triggers the response pipeline as if the user had responded:

// POST /edge-interact/admin/resolve
[HttpPost("admin/resolve")]
[Authorize(Roles = "admin")]
public async Task<IActionResult> ResolveAsync(
    [FromBody] AdminResolveRequest req,
    CancellationToken ct)
{
    // Validate the interaction is still in pending state
    var entry = await _auditStore.GetByIdAsync(req.InteractionId, ct);
    if (entry is null)
        return NotFound();
    if (entry.Status != InteractionStatus.Pending)
        return Conflict($"Interaction is already in state: {entry.Status}");

    var response = new InteractionResponse
    {
        InteractionId = req.InteractionId,
        RespondedBy   = $"admin:{User.Identity!.Name}",
        Outcome       = req.Outcome,
        Data          = req.Data,
        Timestamp     = DateTime.UtcNow,
        AdminOverride = true
    };

    await _pipeline.SubmitResponseAsync(response, ct);
    return Ok(new { resolved = true, outcome = req.Outcome });
}

Manual Cancellation (Force Timeout)

To cancel an interaction without submitting a response (equivalent to forcing a timeout):

// POST /edge-interact/admin/cancel
[HttpPost("admin/cancel")]
[Authorize(Roles = "admin")]
public async Task<IActionResult> CancelAsync(
    [FromBody] AdminCancelRequest req,
    CancellationToken ct)
{
    await _pipeline.CancelAsync(req.InteractionId, reason: req.Reason, ct);
    return Ok();
}

Common Root Causes and Fixes

SymptomRoot CauseFix
deliveredAt is null, user offlineUser is not connected to EdgeStreamWait for reconnect, or escalate via email/notification out-of-band
deliveredAt set, displayedAt nullClient received the interaction but did not render it (renderer missing or error)Check InteractionContainer renderer map for the interaction type; check browser errors
Interaction delivered and displayed, no responseUser is ignoring or cannot find the interactionSend a reminder notification; check inbox UI priority ordering
All interactions for one type stuckRenderer component for that type is crashing (React error boundary)Check browser console; fix the component; redeploy
Callback topic not receiving responseTopic routing misconfigured in EdgeStreamVerify interactions.callback.{id} topic subscription is active on the server
Timeout Is the Automatic Resolution For most stuck interactions, simply waiting for the timeout to expire is the correct course of action. Manual resolution should only be used when the workflow is blocked and waiting is not acceptable (e.g., a time-critical approval).

Grafana Query: Interactions Pending Over Threshold

// Count interactions in-flight for more than 1 hour
# Note: This requires the pipeline to tag in-flight metrics with publish timestamp
# Use the audit log query endpoint for fine-grained stuck detection instead.

# Coarse signal — growing in-flight gauge suggests accumulating stuck interactions:
interaction_in_flight > 50