Loki Overview — Loki Log Aggregation

Loki's Core Design Philosophy

Loki was designed around a key observation: for modern structured logs, you almost always know which service, environment, and tenant you're looking for before you start searching. Indexing that metadata (labels) is sufficient for fast filtering. The log content itself can remain unindexed and compressed — dramatically reducing storage cost compared to Elasticsearch.

How Loki Differs from Elasticsearch

Feature	Loki	Elasticsearch
Index log content	No — grep-style filtering after label match	Yes — inverted index on all fields
Storage cost	Low — compression only, no content index	High — index is 1–3x raw data
Ingest overhead	Very low — write log lines to stream	High — tokenize and index every word
Query for "all errors in service X"	Fast — stream selector + level label	Fast — index lookup
Query for "find all logs containing IP 1.2.3.4"	Slower — full scan of matched streams	Fast — inverted index
Best for	Structured logs with known label dimensions	Arbitrary unstructured text search

Loki's Data Model

Loki organizes logs hierarchically:

Labels — key-value pairs that identify a log stream (e.g., job="processengine", tenant_id="t123")
Stream — all log lines sharing the same label set; identified by the label fingerprint
Chunk — a compressed block of log lines within a stream, covering a time window
Log line — a timestamp + raw text/JSON; not indexed, but filterable after stream selection

BizFirst Stream Labels

BizFirst services use the following standard labels when pushing logs to Loki:

# Example stream selector for production errors in ProcessEngine
{
  "job": "processengine",
  "service": "flow-studio-api",
  "tenant_id": "tenant-abc-123",
  "environment": "production",
  "level": "error"
}

# All log lines with this exact label set form one stream.
# LogQL stream selector syntax:
{job="processengine", environment="production", level="error"}

High-Cardinality Values — Keep Them Out of Labels

The following values are not used as labels — they go into the log line body:

Value	Why Not a Label	How to Filter
`execution_id`	Unique per execution — millions of unique values	`\|= "execution_id=exec-abc"`
`node_key`	Varies per workflow design	`\|= "node_key=approval-01"`
`trace_id`	Unique per request — very high cardinality	`\|= "traceId=4bf92f..."`
`user_id`	Millions of unique user IDs possible	`\|= "userId=user-xyz"` (avoid for privacy)

High-Cardinality Labels Cause Serious Performance Issues

Each unique label combination creates a new stream. If you use execution_id as a label, Loki creates a new stream for every workflow execution — potentially millions of streams. This overwhelms Loki's index and causes memory exhaustion and slow queries. Keep label count low and label cardinality even lower.

LogQL in 60 Seconds

LogQL has two parts: the stream selector (mandatory, uses {}) and filter/parse expressions (optional pipeline):

# Minimum valid LogQL — all logs from processengine in production
{job="processengine", environment="production"}

# Add a filter — find lines containing "error"
{job="processengine"} |= "error"

# Parse JSON and filter a field
{job="processengine"} | json | level="error"

# Count error rate per minute (metric query, used in alerts + panels)
rate({job="processengine"} | json | level="error" [1m])

Grafana Integration

In Grafana, Loki logs appear in:

Explore — ad-hoc log queries with time range and live streaming
Logs panels — on dashboards; show log lines inline with metric charts
Derived Fields — automatically renders traceId in log lines as clickable links to Tempo
Alert rules — Loki-based alert rules trigger on log patterns (via Loki ruler)

Next: Deploying Loki →