Enterprise Options Overview
The default BizFirst Observe stack (single-node Docker Compose) handles up to ~20 tenants and moderate log/metric volume. Enterprise options provide high availability, horizontal scaling, long-term storage, and advanced access control — required for large-scale BizFirstGO deployments.
When to Upgrade from the Default Stack
| Signal | Threshold | Recommended Upgrade |
|---|---|---|
| Log ingestion rate | > 10 GB/day | Loki distributed mode (microservices) |
| Metric retention needed | > 90 days | Thanos for long-term storage |
| Prometheus HA requirement | Zero downtime during scraping | Thanos + multiple Prometheus replicas |
| Grafana SSO required | Corporate directory integration needed | Grafana Enterprise (SAML/OIDC) |
| Tenant count | > 50 tenants | Loki microservices + Kubernetes deployment |
| Availability SLA | 99.9% uptime required | Full Kubernetes HA deployment |
| Multi-region | Teams in > 1 AWS region | Cross-region Prometheus federation |
Enterprise Stack Decision Tree
Loki Distributed
Split Loki into querier, ingester, distributor, compactor — each scales independently. Use when log volume exceeds single-node capacity or when you need write HA.
Thanos
Add Thanos Sidecar to Prometheus for S3 block upload. Add Thanos Querier for deduplicated multi-Prometheus queries. Required for >90-day metric retention or HA metrics.
Tempo HA
Multiple Tempo ingesters with object storage backend. Required when you cannot afford trace ingestion downtime. Adds complexity — justify with a real SLA requirement.
Grafana Enterprise
SSO (SAML, OIDC), query audit logging, data source caching, reporting. Required for corporate identity integration and compliance audit log of who queried what.
Kubernetes Native
Deploy all components via Helm charts on Kubernetes. Enables HPA, PDB, rolling updates, and GitOps. Required for production deployments with SLA commitments.
Multi-Region
Prometheus federation for cross-region dashboards. Loki multi-cluster query for global log search. Required when BizFirstGO is deployed in multiple AWS/Azure regions.
The default single-node stack handles most BizFirstGO deployments well. Upgrading to enterprise options adds operational complexity — more components to monitor, more configuration to maintain. Only upgrade specific components when you have a concrete requirement (e.g., actual Prometheus downtime incidents) rather than preemptively scaling everything.