Portal Community

LogQL — Loki Queries

1. Find all logs for a specific execution

{job="processengine", environment="production"} |= "[EXECUTION_ID]"

2. Find all error logs in the last 15 minutes

{job="processengine", environment="production", level="error"} | json

3. Find logs for a specific tenant

{job="processengine", tenant_id="[TENANT_ID]"} | json | level = "error"

4. Count log lines per level (log volume by severity)

sum by (level) (
  count_over_time({job="processengine"} | json [5m])
)

5. Find HIL suspension events

{job="processengine"} | json | message =~ ".*HIL.*suspend.*"
  | line_format "taskId={{.hilTaskId}} tenant={{.tenantId}}"

6. Find OctopusNode LLM call logs

{job="octopus"} | json | nodeType = "OctopusNode"
  | line_format "model={{.llmModel}} tokens={{.tokenCount}} duration={{.durationMs}}ms"

7. Live stream logs during a deployment

{job="processengine", environment="production"}
# Click "Live" button in Grafana Explore to stream in real time

PromQL — Prometheus Queries

8. Current workflow execution rate

sum(rate(bizfirst_workflow_executions_total[5m]))

9. Error rate percentage

sum(rate(bizfirst_workflow_executions_total{status="failed"}[5m]))
  /
sum(rate(bizfirst_workflow_executions_total[5m]))
* 100

10. P99 workflow execution latency

histogram_quantile(0.99,
  sum(rate(bizfirst_workflow_executions_duration_seconds_bucket[5m])) by (le)
)

11. P99 node execution latency by node type

histogram_quantile(0.99,
  sum(rate(bizfirst_node_execution_duration_seconds_bucket[5m])) by (node_type, le)
)

12. Current HIL backlog by tenant

sum by (tenant_id) (bizfirst_hil_pending_count) > 0

13. EdgeStream message throughput by topic

sum by (topic) (rate(bizfirst_edgestream_messages_total[5m]))

14. Octopus LLM token usage rate

sum by (model) (rate(bizfirst_octopus_tokens_total[5m]))

TraceQL — Tempo Queries

15. Find all error traces

{ status = error }

16. Find traces for a specific tenant

{ span.tenant.id = "[TENANT_ID]" }

17. Find slow workflow executions (> 10 seconds)

{ rootName = "workflow.execute" && duration > 10s }

18. Find traces with slow HttpRequestNode spans

{ span.node.type = "HttpRequestNode" && duration > 5s }

19. Find a specific trace by ID

# In Grafana Explore → Tempo → TraceId search tab
# Paste: [TRACE_ID]

20. Find HIL traces (suspension + resume)

{ rootName = "workflow.execute" && span.node.type = "ApprovalNode" }