In-TEE Observability
Some observability signals are too sensitive to leave the TEE at all — full distributed trace spans, detailed exception messages, internal node computation logs. These can be stored inside the TEE itself using in-TEE instances of Loki and Tempo. Access is restricted to attested parties, but the signals are available for post-incident analysis within the security boundary.
In-TEE vs. External Observability Architecture
| Signal | External (crosses TEE boundary) | In-TEE (stays inside TEE) |
|---|---|---|
| Metrics | Aggregated counts, durations, error rates — label-safe only | Full per-execution metrics with computation context (no boundary crossed) |
| Logs | Sanitized operational logs (execution IDs, timing, error codes) | Full structured logs including error details, node state transitions |
| Traces | TraceId + SpanId only (for correlation) | Full span content: attributes, events, stack traces |
| Grafana access | Standard Grafana outside TEE: sanitized signals only | In-TEE Grafana (attested access only): full signals available |
In-TEE Loki Configuration
# Loki running inside the TEE — receives full logs including sensitive context
# This Loki does NOT have an OTel Collector in front of it for sanitization
# because the data never leaves the TEE
# loki-in-tee-config.yaml
auth_enabled: false # Single-tenant inside TEE; TEE boundary is the auth
server:
http_listen_port: 3101 # Different port from external Loki
grpc_listen_port: 9096
common:
storage:
filesystem:
chunks_directory: /tee/loki/chunks # TEE-encrypted filesystem
rules_directory: /tee/loki/rules
replication_factor: 1 # No replication needed; TEE provides durability
limits_config:
retention_period: 7d # Short retention — in-TEE storage is expensive
ingestion_rate_mb: 20 # Lower limits; TEE has constrained memory
# OTel Collector inside TEE sends full (unsanitized) logs here
# The external OTel Collector (outside TEE) receives only sanitized logs
In-TEE Tempo Configuration
# Tempo running inside the TEE — receives full trace spans
# Span attributes may contain computation details (allowed inside TEE)
# tempo-in-tee-config.yaml
server:
http_listen_port: 3201
grpc_listen_port: 9097
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4318 # Different port from external Tempo
ingester:
max_block_duration: 1h
compactor:
compaction:
block_retention: 72h # 3-day retention; TEE storage constrained
storage:
trace:
backend: local
local:
path: /tee/tempo/blocks # TEE-encrypted filesystem
wal:
path: /tee/tempo/wal
# In-TEE Grafana (attested access) connects to this Tempo
# Full span attributes available for debugging inside the TEE security boundary
Dual OTel Collector Pattern
# Two OTel Collectors: one inside TEE, one outside
# ── INSIDE TEE: otel-collector-internal-config.yaml ──────────────────────
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # BizFirstGO sends to this collector first
processors:
batch:
send_batch_size: 500
timeout: 5s
exporters:
# 1. Full (unsanitized) signals to in-TEE stores
loki/internal:
endpoint: http://loki-internal:3101/loki/api/v1/push
otlp/tempo-internal:
endpoint: http://tempo-internal:4318
# 2. Sanitized signals to external OTel Collector (crosses TEE boundary)
# The sanitization happens in the processors of the EXTERNAL collector
otlp/external-boundary:
endpoint: http://vsock-proxy:4319 # TEE vsock proxy to external collector
service:
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki/internal, otlp/external-boundary]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo-internal, otlp/external-boundary]
# ── OUTSIDE TEE: otel-collector-external-config.yaml ─────────────────────
# Receives from TEE boundary, applies sanitization, forwards to external Loki/Tempo
processors:
redaction:
# Strip any attributes that should not cross the boundary
blocked_values_patterns:
- ".*income.*"
- ".*applicant.*"
- ".*result.*"
transform/sanitize-spans:
log_statements:
- context: span
statements:
# Remove span attributes; keep only trace/span IDs in correlation logs
- delete_key(attributes, "form.field.*")
- delete_key(attributes, "calculation.*")
In-TEE Observability Limitations
| Limitation | Why | Workaround |
|---|---|---|
| No live Grafana access from outside TEE | In-TEE Grafana is only accessible via attested connection; standard browser cannot connect | Use attested client (custom CLI tool with TEE attestation verification) for in-TEE Grafana access |
| Constrained memory for observability | TEE memory is encrypted and limited (typically 512MB–4GB depending on platform) | Short retention (72h), aggressive sampling for in-TEE signals |
| No persistent storage across TEE restarts | TEE memory is cleared on restart; encrypted filesystem may not persist depending on config | Flush critical audit logs to signed external batch before planned restarts |
| No ad-hoc log queries from outside | In-TEE Loki is not accessible from outside the TEE boundary | Define canned queries as part of TEE deployment; run via attested query tool |
Use external (sanitized) observability for real-time dashboards and alerting — it crosses the boundary and reaches Grafana without delay. Use in-TEE observability for post-incident analysis when you need the full signal detail that cannot leave the TEE. Design the workflow: alert fires on external signals → team accesses in-TEE observability via attested channel → root cause identified inside the TEE boundary.