Portal Community

TEE Boundary Telemetry Flow

# Complete telemetry flow for a TEE-deployed BizFirstGO workflow execution

BizFirstGO Process Engine (inside TEE)
  │
  │  OTLP/gRPC (4317) — full unsanitized signals
  ▼
In-TEE OTel Collector
  │  ├── Full signals → In-TEE Loki (3101) — stays inside TEE
  │  │                  In-TEE Tempo (4318) — stays inside TEE
  │  │
  │  └── Sanitized signals → vsock channel → TEE boundary
  │
TEE BOUNDARY ─────────────────────────────────────────────────────────────
  │
  │  vsock proxy (AWS Nitro) or virtio-vsock (Intel TDX)
  ▼
External OTel Collector (outside TEE — boundary control point)
  │  Applies final sanitization:
  │    - Redaction processor: strip forbidden attribute patterns
  │    - Transform processor: remove span attributes, keep TraceId only
  │    - Filter processor: drop any records tagged as sensitive
  │
  ├── Sanitized logs → External Loki → Grafana (operations team)
  ├── Aggregated metrics → External Prometheus → Grafana (operations team)
  └── Trace IDs only → External Tempo → Grafana (trace correlation)

vsock Proxy Configuration (AWS Nitro)

# AWS Nitro Enclaves use vsock (virtual socket) for enclave-to-host communication
# The vsock proxy relays OTel traffic from the in-TEE collector to the host

# On the EC2 host (outside TEE):
# vsock-proxy config: allow only OTel Collector port
vsock-proxy \
  --config vsock-proxy.yaml \
  8080 \       # vsock CID:port inside enclave
  otel-collector.internal:4317  # external OTel Collector endpoint

# vsock-proxy.yaml — allowlist (deny all not listed)
allowlist:
  - {address: "otel-collector.internal.bizfirstai.com", port: 4317}
  # Any other outbound connection from TEE is blocked

# In-TEE OTel Collector sends to vsock:
# otlp/external-boundary:
#   endpoint: http://vsock://8080  (resolves through vsock-proxy)

# This means the TEE cannot "accidentally" send telemetry to any other endpoint
# — the vsock proxy enforces the single approved egress destination

External OTel Collector Sanitization

# otel-collector-external-config.yaml
# This runs OUTSIDE the TEE — last sanitization layer before signals reach stores

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317   # Receives from TEE vsock proxy

processors:
  # Final redaction: strip any attribute that should not appear in external Loki/Tempo
  redaction:
    allow_all_keys: true
    blocked_values_patterns:
      - "bearer [a-zA-Z0-9._-]{20,}"   # Bearer tokens
      - "[0-9]{16}"                       # Card numbers
      - "[0-9]{3}-[0-9]{2}-[0-9]{4}"    # SSN pattern
    blocked_keys:
      - "form.*"           # Any key starting with "form"
      - "calculation.*"    # Computation keys
      - "applicant.*"      # Individual identifiers
      - "result.*"         # Computation results

  # Span sanitization: in TEE, spans crossing boundary lose all attributes
  transform/tee-span-sanitize:
    trace_statements:
      - context: span
        statements:
          # Keep: span name (generic), TraceId (implicit), SpanId (implicit)
          # Remove: all span attributes (may contain computation context)
          - limit(attributes, 0, [])   # Remove all span attributes

  # Log sanitization: verify no payload data leaked through
  transform/tee-log-sanitize:
    log_statements:
      - context: log
        statements:
          # Keep known-safe attributes; drop anything not in allowlist
          - keep_keys(attributes, ["executionId", "workflowType", "nodeType",
              "tenantHash", "durationMs", "errorCode", "traceId", "spanId",
              "stepCount", "severity"])

  batch:
    send_batch_size: 1000
    timeout: 10s

exporters:
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write
  otlp/tempo:
    endpoint: http://tempo:4317   # Receives trace IDs only (spans sanitized above)

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [redaction, transform/tee-log-sanitize, batch]
      exporters: [loki]
    metrics:
      receivers: [otlp]
      processors: [redaction, batch]
      exporters: [prometheusremotewrite]
    traces:
      receivers: [otlp]
      processors: [redaction, transform/tee-span-sanitize, batch]
      exporters: [otlp/tempo]

What Crosses vs. What Is Blocked

Telemetry ItemCrosses Boundary?Action at Boundary
Log record: executionId + workflowType + durationMsYesPassed through after attribute allowlist check
Log record: "Calculation result: 1500000"NoBlocked by redaction pattern match on numeric value in message
Metric: bizfirst_workflow_executions_total with tenant_hash labelYesPassed through; label is a hash, not plaintext tenant name
Metric: bizfirst_workflow_executions_total with amount_range labelNoBlocked by blocked_keys pattern "amount.*"
Trace: TraceId + SpanId (W3C trace context header)YesTraceId in log correlation field; no span content
Span attribute: form.income = 95000NoBlocked by transform/tee-span-sanitize: limit(attributes, 0, [])
Signed log batch (attestation-signed)YesPassed through; signature stored as Loki label for audit queries
Exception message with stack trace + data valueNoBlocked in-TEE before reaching boundary; only error code emitted
Defense in Depth: Sanitize at Both Layers

Do not rely solely on the external OTel Collector for sanitization. The in-TEE code (BizFirstGO service + in-TEE OTel Collector) must also sanitize before emitting across the vsock channel. Defense in depth means the boundary is protected by two independent sanitization layers — a coding mistake in BizFirstGO logging will be caught by the in-TEE OTel Collector; a gap in the in-TEE processor will be caught by the external OTel Collector.