Label Strategy — Loki Log Aggregation

The Golden Rule: Low-Cardinality Labels Only

A label's cardinality is the number of unique values it can take. Labels should have bounded, low cardinality:

Good: environment — 3 values (production, staging, development)
Good: level — 6 values (trace, debug, info, warn, error, fatal)
Acceptable: tenant_id — hundreds of values (not thousands)
Bad: execution_id — millions of unique values
Bad: user_id — millions of unique values
Bad: request_id — unique per request

BizFirst Standard Label Set

Label	Source	Example Values	Cardinality	Purpose
`job`	OTel resource `service.name`	processengine, edgestream, octopus, api, worker	<10	Broad service category filtering
`service`	OTel resource `service.name`	flow-studio-api, processengine-worker	<50	Specific service name
`environment`	OTel resource `deployment.environment`	production, staging, development	<5	Environment separation
`level`	OTel LogRecord severity	trace, debug, info, warn, error, fatal	6	Log severity filtering
`tenant_id`	OTel log attribute `tenant.id`	tenant-abc, tenant-xyz	Hundreds	Multi-tenant isolation

Stream Count Estimation

Total unique streams = product of unique values of each label. For BizFirst:

# Stream count estimate:
jobs:        5 (processengine, edgestream, octopus, api, worker)
environments: 3 (prod, staging, dev)
levels:      6 (trace, debug, info, warn, error, fatal)
tenants:     500 (example: 500 tenant deployment)

Total streams = 5 × 3 × 6 × 500 = 45,000 streams

# This is within Loki's comfortable range (<100,000 streams).
# Adding execution_id as a label (say, 1M executions/day) would create:
# 5 × 3 × 6 × 500 × 1,000,000 = 45 BILLION streams → catastrophic

What Goes in the Log Line (Not Labels)

High-cardinality context belongs in the structured log body. Loki can filter on log line content using |=, |~, | json — just more slowly than label filtering. This trade-off is acceptable because you almost always start a query with a label selector that narrows the result set before content filtering.

# High-cardinality values in the log body (JSON format)
{
  "timestamp": "...",
  "level": "error",        ← label ✓
  "service": "processengine", ← label ✓
  "tenant_id": "t123",     ← label ✓
  "message": "Node failed",
  "executionId": "exec-abc123",  ← log body, not label ✓
  "nodeKey": "approval-01",      ← log body, not label ✓
  "traceId": "4bf92f...",        ← log body, not label ✓
  "workflowId": "wf-xyz"         ← log body, not label ✓
}

Pattern for Finding Specific Executions

Start with label selector (fast, indexed) → then filter on log body content (slower, but applied to small result set):

{job="processengine", tenant_id="t123", level="error"} |= "executionId=exec-abc123"

Label Naming Conventions

Use snake_case for label names (e.g., tenant_id, not tenantId or TenantId)
Never use dots in label names — they cause issues in some Grafana versions
Keep label names short and meaningful — they appear in every LogQL query
Be consistent across all BizFirst services — the same label name for the same concept

← Log Ingestion from BizFirst Next: LogQL Basics →