Portal Community

What Is a Server Group?

A Server Group is a named set of server node instances that all implement the same HTTP API surface. The process engine routes calls to the group by name; the group's load balancer picks the least-busy healthy instance.

// Server Group: "inference-cluster"
//   Instance 1: https://ai-node-01.internal:8080
//   Instance 2: https://ai-node-02.internal:8080
//   Instance 3: https://ai-node-03.internal:8080

// Caller (workflow node or Octopus MCP tool) calls the group by name:
var result = await _serverGroupClient.PostAsync(
    groupName: "inference-cluster",
    endpoint:  "/infer",
    body:      new { prompt = "Summarize the following...", maxTokens = 512 });

Registering a Server Node

// POST /api/server-groups/{groupName}/nodes
{
  "name":      "ai-node-01",
  "baseUrl":   "https://ai-node-01.internal:8080",
  "healthUrl": "https://ai-node-01.internal:8080/health",
  "weight":    1,
  "metadata": {
    "gpu":    "A100",
    "region": "eastus"
  }
}

// Response
{
  "nodeId":    "node-abc123",
  "groupName": "inference-cluster",
  "status":    "healthy"
}

Health Monitoring and Failover

The Server Group controller polls each node's healthUrl every 15 seconds. A node that fails 3 consecutive checks is marked unhealthy and removed from the routing pool. It is re-admitted automatically when health checks resume passing.

Node StateRouting BehaviourRecovery
HealthyReceives new requests according to weight
Degraded (1-2 fails)Still receives requests; alert raisedAuto-recover on next passing check
Unhealthy (3+ fails)Removed from pool; no new requests routedAuto-readmit when health passes
DrainingNo new requests; existing in-flight requests completeManual — operator removes node

Load Balancing Strategies

StrategyDescriptionBest For
Round RobinRequests distributed evenly in orderUniform, stateless workloads
Weighted Round RobinHigher-weight nodes receive proportionally more requestsMixed-capacity nodes (different GPU sizes)
Least ConnectionsRoute to the node with fewest active requestsVariable-duration requests (LLM inference)
Sticky SessionRoute the same tenant/session to the same nodeStateful nodes that cache per-tenant data

Calling a Server Group from a Workflow Node

// Workflow executor that calls a server group
public class InferenceWorkflowNodeExecutor : BaseNodeExecutor
{
    private readonly IServerGroupClient _serverGroup;

    public InferenceWorkflowNodeExecutor(IServerGroupClient serverGroup)
        => _serverGroup = serverGroup;

    public override async Task ExecuteAsync(
        INodeExecutionContext context,
        CancellationToken ct)
    {
        var prompt    = GetInput<string>(context, "Prompt");
        var maxTokens = GetInput<int>(context, "MaxTokens");

        var response  = await _serverGroup.PostAsync<InferenceResponse>(
            groupName: "inference-cluster",
            endpoint:  "/infer",
            body:      new { prompt, maxTokens },
            ct);

        SetOutput(context, "GeneratedText", response.Text);
        SetOutput(context, "TokensUsed",    response.TokensUsed);
    }
}

Calling a Server Group from an Octopus MCP Tool

// MCP tool handler that calls a server group
public async Task<JsonElement> HandleInferAsync(JsonElement args)
{
    var prompt    = args.GetProperty("prompt").GetString()!;
    var maxTokens = args.TryGetProperty("max_tokens", out var mt)
                        ? mt.GetInt32() : 512;

    var response = await _serverGroupClient.PostAsync<InferenceResponse>(
        groupName: "inference-cluster",
        endpoint:  "/infer",
        body:      new { prompt, maxTokens });

    return JsonSerializer.SerializeToElement(new
    {
        generated_text = response.Text,
        tokens_used    = response.TokensUsed,
        model          = response.Model
    });
}

When to Use Server Group Mode

Use CaseReason Server Group Fits
AI inference endpointGPU warmup is expensive; keep the model loaded 24/7
High-throughput data processingMany parallel callers; horizontal scaling via more nodes
Shared enterprise serviceMultiple workflows and agents call the same service
Independent deployment cadenceUpdate the server node without redeploying the Octopus host
Multi-region executionRegister nodes in different regions; route by metadata tag