Portal Community

In-TEE vs. External Observability Architecture

SignalExternal (crosses TEE boundary)In-TEE (stays inside TEE)
MetricsAggregated counts, durations, error rates — label-safe onlyFull per-execution metrics with computation context (no boundary crossed)
LogsSanitized operational logs (execution IDs, timing, error codes)Full structured logs including error details, node state transitions
TracesTraceId + SpanId only (for correlation)Full span content: attributes, events, stack traces
Grafana accessStandard Grafana outside TEE: sanitized signals onlyIn-TEE Grafana (attested access only): full signals available

In-TEE Loki Configuration

# Loki running inside the TEE — receives full logs including sensitive context
# This Loki does NOT have an OTel Collector in front of it for sanitization
# because the data never leaves the TEE

# loki-in-tee-config.yaml
auth_enabled: false   # Single-tenant inside TEE; TEE boundary is the auth

server:
  http_listen_port: 3101    # Different port from external Loki
  grpc_listen_port: 9096

common:
  storage:
    filesystem:
      chunks_directory: /tee/loki/chunks    # TEE-encrypted filesystem
      rules_directory: /tee/loki/rules
  replication_factor: 1   # No replication needed; TEE provides durability

limits_config:
  retention_period: 7d    # Short retention — in-TEE storage is expensive
  ingestion_rate_mb: 20   # Lower limits; TEE has constrained memory

# OTel Collector inside TEE sends full (unsanitized) logs here
# The external OTel Collector (outside TEE) receives only sanitized logs

In-TEE Tempo Configuration

# Tempo running inside the TEE — receives full trace spans
# Span attributes may contain computation details (allowed inside TEE)

# tempo-in-tee-config.yaml
server:
  http_listen_port: 3201
  grpc_listen_port: 9097

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4318   # Different port from external Tempo

ingester:
  max_block_duration: 1h

compactor:
  compaction:
    block_retention: 72h   # 3-day retention; TEE storage constrained

storage:
  trace:
    backend: local
    local:
      path: /tee/tempo/blocks   # TEE-encrypted filesystem
    wal:
      path: /tee/tempo/wal

# In-TEE Grafana (attested access) connects to this Tempo
# Full span attributes available for debugging inside the TEE security boundary

Dual OTel Collector Pattern

# Two OTel Collectors: one inside TEE, one outside

# ── INSIDE TEE: otel-collector-internal-config.yaml ──────────────────────
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317   # BizFirstGO sends to this collector first

processors:
  batch:
    send_batch_size: 500
    timeout: 5s

exporters:
  # 1. Full (unsanitized) signals to in-TEE stores
  loki/internal:
    endpoint: http://loki-internal:3101/loki/api/v1/push
  otlp/tempo-internal:
    endpoint: http://tempo-internal:4318

  # 2. Sanitized signals to external OTel Collector (crosses TEE boundary)
  # The sanitization happens in the processors of the EXTERNAL collector
  otlp/external-boundary:
    endpoint: http://vsock-proxy:4319   # TEE vsock proxy to external collector

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [loki/internal, otlp/external-boundary]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/tempo-internal, otlp/external-boundary]

# ── OUTSIDE TEE: otel-collector-external-config.yaml ─────────────────────
# Receives from TEE boundary, applies sanitization, forwards to external Loki/Tempo
processors:
  redaction:
    # Strip any attributes that should not cross the boundary
    blocked_values_patterns:
      - ".*income.*"
      - ".*applicant.*"
      - ".*result.*"
  transform/sanitize-spans:
    log_statements:
      - context: span
        statements:
          # Remove span attributes; keep only trace/span IDs in correlation logs
          - delete_key(attributes, "form.field.*")
          - delete_key(attributes, "calculation.*")

In-TEE Observability Limitations

LimitationWhyWorkaround
No live Grafana access from outside TEEIn-TEE Grafana is only accessible via attested connection; standard browser cannot connectUse attested client (custom CLI tool with TEE attestation verification) for in-TEE Grafana access
Constrained memory for observabilityTEE memory is encrypted and limited (typically 512MB–4GB depending on platform)Short retention (72h), aggressive sampling for in-TEE signals
No persistent storage across TEE restartsTEE memory is cleared on restart; encrypted filesystem may not persist depending on configFlush critical audit logs to signed external batch before planned restarts
No ad-hoc log queries from outsideIn-TEE Loki is not accessible from outside the TEE boundaryDefine canned queries as part of TEE deployment; run via attested query tool
In-TEE Observability Is for Post-Incident Analysis, Not Real-Time Monitoring

Use external (sanitized) observability for real-time dashboards and alerting — it crosses the boundary and reaches Grafana without delay. Use in-TEE observability for post-incident analysis when you need the full signal detail that cannot leave the TEE. Design the workflow: alert fires on external signals → team accesses in-TEE observability via attested channel → root cause identified inside the TEE boundary.