Telemetry at the TEE Boundary
The TEE boundary is where security policy meets observability engineering. Telemetry crossing this boundary must be explicitly approved, sanitized by the in-TEE OTel Collector, and delivered through an approved egress channel. The external OTel Collector is the single control point for all outbound telemetry — it applies final sanitization before data reaches Loki, Prometheus, and Tempo.
TEE Boundary Telemetry Flow
# Complete telemetry flow for a TEE-deployed BizFirstGO workflow execution
BizFirstGO Process Engine (inside TEE)
│
│ OTLP/gRPC (4317) — full unsanitized signals
▼
In-TEE OTel Collector
│ ├── Full signals → In-TEE Loki (3101) — stays inside TEE
│ │ In-TEE Tempo (4318) — stays inside TEE
│ │
│ └── Sanitized signals → vsock channel → TEE boundary
│
TEE BOUNDARY ─────────────────────────────────────────────────────────────
│
│ vsock proxy (AWS Nitro) or virtio-vsock (Intel TDX)
▼
External OTel Collector (outside TEE — boundary control point)
│ Applies final sanitization:
│ - Redaction processor: strip forbidden attribute patterns
│ - Transform processor: remove span attributes, keep TraceId only
│ - Filter processor: drop any records tagged as sensitive
│
├── Sanitized logs → External Loki → Grafana (operations team)
├── Aggregated metrics → External Prometheus → Grafana (operations team)
└── Trace IDs only → External Tempo → Grafana (trace correlation)
vsock Proxy Configuration (AWS Nitro)
# AWS Nitro Enclaves use vsock (virtual socket) for enclave-to-host communication
# The vsock proxy relays OTel traffic from the in-TEE collector to the host
# On the EC2 host (outside TEE):
# vsock-proxy config: allow only OTel Collector port
vsock-proxy \
--config vsock-proxy.yaml \
8080 \ # vsock CID:port inside enclave
otel-collector.internal:4317 # external OTel Collector endpoint
# vsock-proxy.yaml — allowlist (deny all not listed)
allowlist:
- {address: "otel-collector.internal.bizfirstai.com", port: 4317}
# Any other outbound connection from TEE is blocked
# In-TEE OTel Collector sends to vsock:
# otlp/external-boundary:
# endpoint: http://vsock://8080 (resolves through vsock-proxy)
# This means the TEE cannot "accidentally" send telemetry to any other endpoint
# — the vsock proxy enforces the single approved egress destination
External OTel Collector Sanitization
# otel-collector-external-config.yaml
# This runs OUTSIDE the TEE — last sanitization layer before signals reach stores
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # Receives from TEE vsock proxy
processors:
# Final redaction: strip any attribute that should not appear in external Loki/Tempo
redaction:
allow_all_keys: true
blocked_values_patterns:
- "bearer [a-zA-Z0-9._-]{20,}" # Bearer tokens
- "[0-9]{16}" # Card numbers
- "[0-9]{3}-[0-9]{2}-[0-9]{4}" # SSN pattern
blocked_keys:
- "form.*" # Any key starting with "form"
- "calculation.*" # Computation keys
- "applicant.*" # Individual identifiers
- "result.*" # Computation results
# Span sanitization: in TEE, spans crossing boundary lose all attributes
transform/tee-span-sanitize:
trace_statements:
- context: span
statements:
# Keep: span name (generic), TraceId (implicit), SpanId (implicit)
# Remove: all span attributes (may contain computation context)
- limit(attributes, 0, []) # Remove all span attributes
# Log sanitization: verify no payload data leaked through
transform/tee-log-sanitize:
log_statements:
- context: log
statements:
# Keep known-safe attributes; drop anything not in allowlist
- keep_keys(attributes, ["executionId", "workflowType", "nodeType",
"tenantHash", "durationMs", "errorCode", "traceId", "spanId",
"stepCount", "severity"])
batch:
send_batch_size: 1000
timeout: 10s
exporters:
loki:
endpoint: http://loki:3100/loki/api/v1/push
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
otlp/tempo:
endpoint: http://tempo:4317 # Receives trace IDs only (spans sanitized above)
service:
pipelines:
logs:
receivers: [otlp]
processors: [redaction, transform/tee-log-sanitize, batch]
exporters: [loki]
metrics:
receivers: [otlp]
processors: [redaction, batch]
exporters: [prometheusremotewrite]
traces:
receivers: [otlp]
processors: [redaction, transform/tee-span-sanitize, batch]
exporters: [otlp/tempo]
What Crosses vs. What Is Blocked
| Telemetry Item | Crosses Boundary? | Action at Boundary |
|---|---|---|
| Log record: executionId + workflowType + durationMs | Yes | Passed through after attribute allowlist check |
| Log record: "Calculation result: 1500000" | No | Blocked by redaction pattern match on numeric value in message |
| Metric: bizfirst_workflow_executions_total with tenant_hash label | Yes | Passed through; label is a hash, not plaintext tenant name |
| Metric: bizfirst_workflow_executions_total with amount_range label | No | Blocked by blocked_keys pattern "amount.*" |
| Trace: TraceId + SpanId (W3C trace context header) | Yes | TraceId in log correlation field; no span content |
| Span attribute: form.income = 95000 | No | Blocked by transform/tee-span-sanitize: limit(attributes, 0, []) |
| Signed log batch (attestation-signed) | Yes | Passed through; signature stored as Loki label for audit queries |
| Exception message with stack trace + data value | No | Blocked in-TEE before reaching boundary; only error code emitted |
Do not rely solely on the external OTel Collector for sanitization. The in-TEE code (BizFirstGO service + in-TEE OTel Collector) must also sanitize before emitting across the vsock channel. Defense in depth means the boundary is protected by two independent sanitization layers — a coding mistake in BizFirstGO logging will be caught by the in-TEE OTel Collector; a gap in the in-TEE processor will be caught by the external OTel Collector.