Octopus — Server Groups
Server Group Mode
In server group mode, an execution node runs as a persistent HTTP service registered in a Server Group. Workflows, agents, and other services call it over HTTP. The server group handles load balancing, health monitoring, and failover — callers just call the group endpoint.
What Is a Server Group?
A Server Group is a named set of server node instances that all implement the same HTTP API surface. The process engine routes calls to the group by name; the group's load balancer picks the least-busy healthy instance.
// Server Group: "inference-cluster"
// Instance 1: https://ai-node-01.internal:8080
// Instance 2: https://ai-node-02.internal:8080
// Instance 3: https://ai-node-03.internal:8080
// Caller (workflow node or Octopus MCP tool) calls the group by name:
var result = await _serverGroupClient.PostAsync(
groupName: "inference-cluster",
endpoint: "/infer",
body: new { prompt = "Summarize the following...", maxTokens = 512 });
Registering a Server Node
// POST /api/server-groups/{groupName}/nodes
{
"name": "ai-node-01",
"baseUrl": "https://ai-node-01.internal:8080",
"healthUrl": "https://ai-node-01.internal:8080/health",
"weight": 1,
"metadata": {
"gpu": "A100",
"region": "eastus"
}
}
// Response
{
"nodeId": "node-abc123",
"groupName": "inference-cluster",
"status": "healthy"
}
Health Monitoring and Failover
The Server Group controller polls each node's healthUrl every 15 seconds. A node that fails 3 consecutive checks is marked unhealthy and removed from the routing pool. It is re-admitted automatically when health checks resume passing.
| Node State | Routing Behaviour | Recovery |
|---|---|---|
| Healthy | Receives new requests according to weight | — |
| Degraded (1-2 fails) | Still receives requests; alert raised | Auto-recover on next passing check |
| Unhealthy (3+ fails) | Removed from pool; no new requests routed | Auto-readmit when health passes |
| Draining | No new requests; existing in-flight requests complete | Manual — operator removes node |
Load Balancing Strategies
| Strategy | Description | Best For |
|---|---|---|
| Round Robin | Requests distributed evenly in order | Uniform, stateless workloads |
| Weighted Round Robin | Higher-weight nodes receive proportionally more requests | Mixed-capacity nodes (different GPU sizes) |
| Least Connections | Route to the node with fewest active requests | Variable-duration requests (LLM inference) |
| Sticky Session | Route the same tenant/session to the same node | Stateful nodes that cache per-tenant data |
Calling a Server Group from a Workflow Node
// Workflow executor that calls a server group
public class InferenceWorkflowNodeExecutor : BaseNodeExecutor
{
private readonly IServerGroupClient _serverGroup;
public InferenceWorkflowNodeExecutor(IServerGroupClient serverGroup)
=> _serverGroup = serverGroup;
public override async Task ExecuteAsync(
INodeExecutionContext context,
CancellationToken ct)
{
var prompt = GetInput<string>(context, "Prompt");
var maxTokens = GetInput<int>(context, "MaxTokens");
var response = await _serverGroup.PostAsync<InferenceResponse>(
groupName: "inference-cluster",
endpoint: "/infer",
body: new { prompt, maxTokens },
ct);
SetOutput(context, "GeneratedText", response.Text);
SetOutput(context, "TokensUsed", response.TokensUsed);
}
}
Calling a Server Group from an Octopus MCP Tool
// MCP tool handler that calls a server group
public async Task<JsonElement> HandleInferAsync(JsonElement args)
{
var prompt = args.GetProperty("prompt").GetString()!;
var maxTokens = args.TryGetProperty("max_tokens", out var mt)
? mt.GetInt32() : 512;
var response = await _serverGroupClient.PostAsync<InferenceResponse>(
groupName: "inference-cluster",
endpoint: "/infer",
body: new { prompt, maxTokens });
return JsonSerializer.SerializeToElement(new
{
generated_text = response.Text,
tokens_used = response.TokensUsed,
model = response.Model
});
}
When to Use Server Group Mode
| Use Case | Reason Server Group Fits |
|---|---|
| AI inference endpoint | GPU warmup is expensive; keep the model loaded 24/7 |
| High-throughput data processing | Many parallel callers; horizontal scaling via more nodes |
| Shared enterprise service | Multiple workflows and agents call the same service |
| Independent deployment cadence | Update the server node without redeploying the Octopus host |
| Multi-region execution | Register nodes in different regions; route by metadata tag |