Loki Overview
Grafana Loki is the log aggregation engine for BizFirst Observe. It collects structured JSON logs from all BizFirstGO services, indexes them by label set for fast filtering, and makes them queryable via LogQL. Its design prioritizes low storage cost over full-text indexing.
Loki's Core Design Philosophy
Loki was designed around a key observation: for modern structured logs, you almost always know which service, environment, and tenant you're looking for before you start searching. Indexing that metadata (labels) is sufficient for fast filtering. The log content itself can remain unindexed and compressed — dramatically reducing storage cost compared to Elasticsearch.
How Loki Differs from Elasticsearch
| Feature | Loki | Elasticsearch |
|---|---|---|
| Index log content | No — grep-style filtering after label match | Yes — inverted index on all fields |
| Storage cost | Low — compression only, no content index | High — index is 1–3x raw data |
| Ingest overhead | Very low — write log lines to stream | High — tokenize and index every word |
| Query for "all errors in service X" | Fast — stream selector + level label | Fast — index lookup |
| Query for "find all logs containing IP 1.2.3.4" | Slower — full scan of matched streams | Fast — inverted index |
| Best for | Structured logs with known label dimensions | Arbitrary unstructured text search |
Loki's Data Model
Loki organizes logs hierarchically:
- Labels — key-value pairs that identify a log stream (e.g.,
job="processengine",tenant_id="t123") - Stream — all log lines sharing the same label set; identified by the label fingerprint
- Chunk — a compressed block of log lines within a stream, covering a time window
- Log line — a timestamp + raw text/JSON; not indexed, but filterable after stream selection
BizFirstGO Stream Labels
BizFirstGO services use the following standard labels when pushing logs to Loki:
# Example stream selector for production errors in ProcessEngine
{
"job": "processengine",
"service": "flow-studio-api",
"tenant_id": "tenant-abc-123",
"environment": "production",
"level": "error"
}
# All log lines with this exact label set form one stream.
# LogQL stream selector syntax:
{job="processengine", environment="production", level="error"}
High-Cardinality Values — Keep Them Out of Labels
The following values are not used as labels — they go into the log line body:
| Value | Why Not a Label | How to Filter |
|---|---|---|
execution_id | Unique per execution — millions of unique values | |= "execution_id=exec-abc" |
node_key | Varies per workflow design | |= "node_key=approval-01" |
trace_id | Unique per request — very high cardinality | |= "traceId=4bf92f..." |
user_id | Millions of unique user IDs possible | |= "userId=user-xyz" (avoid for privacy) |
Each unique label combination creates a new stream. If you use execution_id as a label, Loki creates a new stream for every workflow execution — potentially millions of streams. This overwhelms Loki's index and causes memory exhaustion and slow queries. Keep label count low and label cardinality even lower.
LogQL in 60 Seconds
LogQL has two parts: the stream selector (mandatory, uses {}) and filter/parse expressions (optional pipeline):
# Minimum valid LogQL — all logs from processengine in production
{job="processengine", environment="production"}
# Add a filter — find lines containing "error"
{job="processengine"} |= "error"
# Parse JSON and filter a field
{job="processengine"} | json | level="error"
# Count error rate per minute (metric query, used in alerts + panels)
rate({job="processengine"} | json | level="error" [1m])
Grafana Integration
In Grafana, Loki logs appear in:
- Explore — ad-hoc log queries with time range and live streaming
- Logs panels — on dashboards; show log lines inline with metric charts
- Derived Fields — automatically renders
traceIdin log lines as clickable links to Tempo - Alert rules — Loki-based alert rules trigger on log patterns (via Loki ruler)