BizFirst Observe
Common Query Reference
Ready-to-use LogQL, PromQL, and TraceQL queries for the most common BizFirstGO observability scenarios. Copy, paste, and substitute the placeholder values (shown in brackets).
LogQL — Loki Queries
1. Find all logs for a specific execution
{job="processengine", environment="production"} |= "[EXECUTION_ID]"
2. Find all error logs in the last 15 minutes
{job="processengine", environment="production", level="error"} | json
3. Find logs for a specific tenant
{job="processengine", tenant_id="[TENANT_ID]"} | json | level = "error"
4. Count log lines per level (log volume by severity)
sum by (level) (
count_over_time({job="processengine"} | json [5m])
)
5. Find HIL suspension events
{job="processengine"} | json | message =~ ".*HIL.*suspend.*"
| line_format "taskId={{.hilTaskId}} tenant={{.tenantId}}"
6. Find OctopusNode LLM call logs
{job="octopus"} | json | nodeType = "OctopusNode"
| line_format "model={{.llmModel}} tokens={{.tokenCount}} duration={{.durationMs}}ms"
7. Live stream logs during a deployment
{job="processengine", environment="production"}
# Click "Live" button in Grafana Explore to stream in real time
PromQL — Prometheus Queries
8. Current workflow execution rate
sum(rate(bizfirst_workflow_executions_total[5m]))
9. Error rate percentage
sum(rate(bizfirst_workflow_executions_total{status="failed"}[5m]))
/
sum(rate(bizfirst_workflow_executions_total[5m]))
* 100
10. P99 workflow execution latency
histogram_quantile(0.99,
sum(rate(bizfirst_workflow_executions_duration_seconds_bucket[5m])) by (le)
)
11. P99 node execution latency by node type
histogram_quantile(0.99,
sum(rate(bizfirst_node_execution_duration_seconds_bucket[5m])) by (node_type, le)
)
12. Current HIL backlog by tenant
sum by (tenant_id) (bizfirst_hil_pending_count) > 0
13. EdgeStream message throughput by topic
sum by (topic) (rate(bizfirst_edgestream_messages_total[5m]))
14. Octopus LLM token usage rate
sum by (model) (rate(bizfirst_octopus_tokens_total[5m]))
TraceQL — Tempo Queries
15. Find all error traces
{ status = error }
16. Find traces for a specific tenant
{ span.tenant.id = "[TENANT_ID]" }
17. Find slow workflow executions (> 10 seconds)
{ rootName = "workflow.execute" && duration > 10s }
18. Find traces with slow HttpRequestNode spans
{ span.node.type = "HttpRequestNode" && duration > 5s }
19. Find a specific trace by ID
# In Grafana Explore → Tempo → TraceId search tab
# Paste: [TRACE_ID]
20. Find HIL traces (suspension + resume)
{ rootName = "workflow.execute" && span.node.type = "ApprovalNode" }