Server Group Mode — Workflow vs Server Nodes

What Is a Server Group?

A Server Group is a named set of server node instances that all implement the same HTTP API surface. The process engine routes calls to the group by name; the group's load balancer picks the least-busy healthy instance.

// Server Group: "inference-cluster"
//   Instance 1: https://ai-node-01.internal:8080
//   Instance 2: https://ai-node-02.internal:8080
//   Instance 3: https://ai-node-03.internal:8080

// Caller (workflow node or Octopus MCP tool) calls the group by name:
var result = await _serverGroupClient.PostAsync(
    groupName: "inference-cluster",
    endpoint:  "/infer",
    body:      new { prompt = "Summarize the following...", maxTokens = 512 });

Registering a Server Node

// POST /api/server-groups/{groupName}/nodes
{
  "name":      "ai-node-01",
  "baseUrl":   "https://ai-node-01.internal:8080",
  "healthUrl": "https://ai-node-01.internal:8080/health",
  "weight":    1,
  "metadata": {
    "gpu":    "A100",
    "region": "eastus"
  }
}

// Response
{
  "nodeId":    "node-abc123",
  "groupName": "inference-cluster",
  "status":    "healthy"
}

Health Monitoring and Failover

The Server Group controller polls each node's healthUrl every 15 seconds. A node that fails 3 consecutive checks is marked unhealthy and removed from the routing pool. It is re-admitted automatically when health checks resume passing.

Node State	Routing Behaviour	Recovery
Healthy	Receives new requests according to weight	—
Degraded (1-2 fails)	Still receives requests; alert raised	Auto-recover on next passing check
Unhealthy (3+ fails)	Removed from pool; no new requests routed	Auto-readmit when health passes
Draining	No new requests; existing in-flight requests complete	Manual — operator removes node

Load Balancing Strategies

Strategy	Description	Best For
Round Robin	Requests distributed evenly in order	Uniform, stateless workloads
Weighted Round Robin	Higher-weight nodes receive proportionally more requests	Mixed-capacity nodes (different GPU sizes)
Least Connections	Route to the node with fewest active requests	Variable-duration requests (LLM inference)
Sticky Session	Route the same tenant/session to the same node	Stateful nodes that cache per-tenant data

Calling a Server Group from a Workflow Node

// Workflow executor that calls a server group
public class InferenceWorkflowNodeExecutor : BaseNodeExecutor
{
    private readonly IServerGroupClient _serverGroup;

    public InferenceWorkflowNodeExecutor(IServerGroupClient serverGroup)
        => _serverGroup = serverGroup;

    public override async Task ExecuteAsync(
        INodeExecutionContext context,
        CancellationToken ct)
    {
        var prompt    = GetInput<string>(context, "Prompt");
        var maxTokens = GetInput<int>(context, "MaxTokens");

        var response  = await _serverGroup.PostAsync<InferenceResponse>(
            groupName: "inference-cluster",
            endpoint:  "/infer",
            body:      new { prompt, maxTokens },
            ct);

        SetOutput(context, "GeneratedText", response.Text);
        SetOutput(context, "TokensUsed",    response.TokensUsed);
    }
}

Calling a Server Group from an Octopus MCP Tool

// MCP tool handler that calls a server group
public async Task<JsonElement> HandleInferAsync(JsonElement args)
{
    var prompt    = args.GetProperty("prompt").GetString()!;
    var maxTokens = args.TryGetProperty("max_tokens", out var mt)
                        ? mt.GetInt32() : 512;

    var response = await _serverGroupClient.PostAsync<InferenceResponse>(
        groupName: "inference-cluster",
        endpoint:  "/infer",
        body:      new { prompt, maxTokens });

    return JsonSerializer.SerializeToElement(new
    {
        generated_text = response.Text,
        tokens_used    = response.TokensUsed,
        model          = response.Model
    });
}

When to Use Server Group Mode

Use Case	Reason Server Group Fits
AI inference endpoint	GPU warmup is expensive; keep the model loaded 24/7
High-throughput data processing	Many parallel callers; horizontal scaling via more nodes
Shared enterprise service	Multiple workflows and agents call the same service
Independent deployment cadence	Update the server node without redeploying the Octopus host
Multi-region execution	Register nodes in different regions; route by metadata tag

← Workflow Mode Next: Server Node as a Service →