Portal Community

The Three Use Cases

Debug: Something Failed

A workflow execution failed. Find the error log, trace the code path through spans, identify the responsible node, and read the exception details. Covered in: Find Logs, Trace a Slow Node, Error Analysis.

Monitor: Is Everything Healthy?

Regularly check the Flow Studio Overview dashboard. Watch the HIL backlog for growing approval queues. Monitor EdgeStream throughput. Check Octopus LLM call rates. Covered in: HIL Backlog, Tenant Queries.

Alert: Something Is Wrong Right Now

An alert fires in Slack or PagerDuty. Click the link in the notification, understand the alert context, drill into the relevant dashboard, and start remediation. Covered in: Alert Response.

Which Tool for Which Task?

TaskToolStarting Point
Find all logs for a specific executionGrafana Explore (Loki){job="processengine"} |= "exec-id"
Find the trace for a specific executionGrafana Explore (Tempo)TraceId from log line → Derived Field link
Identify the slowest node typeNode Performance dashboardP99 Latency by Node Type panel
Check current HIL backlogHIL Analytics dashboardCurrent Backlog gauge + Overdue Tasks stat
Check system-wide healthFlow Studio Overview dashboardError Rate + P99 Latency panels
Investigate an alertGrafana alert detail → linked dashboardAlert notification link
Scope data to one tenantDashboard variable $tenantDropdown in top bar of any dashboard

Engineer Personas

RolePrimary ToolsKey Dashboards
On-call engineer (incident response)Grafana Explore, alert notificationsFlow Studio Overview, Error Analysis
Workflow developer (debugging)Grafana Explore (Loki + Tempo), split viewNode Performance, Trace Explorer
Operations (daily monitoring)Dashboard viewerFlow Studio Overview, HIL Analytics, Tenant Health
Process owner (SLA monitoring)Dashboard viewerHIL Analytics, Tenant Health
Platform engineerAll tools including Prometheus UI, AlertmanagerInfrastructure, all dashboards