Server Node as a Service — Workflow vs Server Nodes

What a Server Node Actually Is

A server node is an ordinary ASP.NET Core (or other runtime) web service that:

Implements the Server Group health contract (GET /health returns 200 when ready)
Exposes one or more HTTP endpoints that callers invoke
Registers itself in the central Server Group registry at startup
De-registers itself during graceful shutdown

It carries no other obligation. It can have its own database, its own cache, its own background threads, its own authentication model, its own GPU allocation — anything a normal microservice can have.

Self-Registration at Startup

// In your server node's Program.cs
var builder = WebApplication.CreateBuilder(args);
// ... register your services ...

var app = builder.Build();

// Standard health endpoint
app.MapGet("/health", () => Results.Ok(new { status = "ok" }));

// Your business endpoints
app.MapPost("/infer", InferenceEndpoint.Handle);

// Register with the central Server Group on startup
var lifetime = app.Lifetime;
var registry = app.Services.GetRequiredService<IServerGroupRegistrar>();
var config   = app.Services.GetRequiredService<IOptions<ServerNodeConfig>>().Value;

lifetime.ApplicationStarted.Register(async () =>
{
    await registry.RegisterAsync(new ServerNodeRegistration
    {
        GroupName  = config.GroupName,
        Name       = config.NodeName,
        BaseUrl    = config.PublicBaseUrl,
        HealthUrl  = $"{config.PublicBaseUrl}/health",
        Weight     = config.Weight,
        Metadata   = config.Metadata
    });
});

lifetime.ApplicationStopping.Register(async () =>
{
    await registry.DeregisterAsync(config.GroupName, config.NodeName);
});

app.Run();

Common Server Node Patterns

Warm-Cache Node

Server node pre-loads expensive data at startup (product catalog, ML embeddings) into an in-memory cache. Callers get sub-millisecond responses instead of hitting the database on each request.

Stateful Session Node

Server node maintains per-session state (e.g. active browser sessions via Playwright). Sticky routing directs all calls for a given session to the same node instance.

GPU-Resident Inference Node

Server node loads an AI model onto GPU at startup and holds it there. Removing cold-start latency from every inference call.

Queue-Draining Worker Node

Server node polls a queue (Service Bus, SQS) in the background and exposes a /status endpoint. The workflow checks status rather than blocking.

Server Node vs Microservice

Aspect	Traditional Microservice	BizFirst Server Node
Discovery	Service mesh / DNS	Server Group registry + process engine routing
Health monitoring	K8s liveness/readiness probes	Server Group controller polls `/health`
Load balancing	K8s Service / Ingress	Server Group load balancing strategy
Workflow integration	Custom — caller must know the URL	Built-in — workflow nodes call by group name
Agent tool integration	Custom MCP adapter	Register as MCP server pointing at group endpoint
Telemetry	Custom instrumentation	Server Group adds correlation headers automatically

High-Throughput Design Patterns

Response Caching

// Cache expensive computation results keyed by input hash
app.MapPost("/classify", async (ClassifyRequest req, IMemoryCache cache) =>
{
    var cacheKey = $"classify:{req.GetHashCode()}";

    if (cache.TryGetValue(cacheKey, out ClassifyResponse? cached))
        return Results.Ok(cached);

    var result = await classifier.ClassifyAsync(req.Text);
    cache.Set(cacheKey, result, TimeSpan.FromMinutes(10));
    return Results.Ok(result);
});

Request Batching

// Buffer individual requests and process in batches
public class BatchingInferenceService : BackgroundService
{
    private readonly Channel<InferenceRequest> _queue =
        Channel.CreateBounded<InferenceRequest>(
            new BoundedChannelOptions(1000)
            { FullMode = BoundedChannelFullMode.Wait });

    public async Task<InferenceResult> EnqueueAsync(
        InferenceRequest request, CancellationToken ct)
    {
        var tcs = new TaskCompletionSource<InferenceResult>();
        request.Completion = tcs;
        await _queue.Writer.WriteAsync(request, ct);
        return await tcs.Task;
    }

    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        var batch = new List<InferenceRequest>(32);
        while (!ct.IsCancellationRequested)
        {
            batch.Clear();
            // Drain up to 32 requests or wait 50ms
            var deadline = DateTime.UtcNow.AddMilliseconds(50);
            while (batch.Count < 32 && DateTime.UtcNow < deadline)
            {
                if (_queue.Reader.TryRead(out var req))
                    batch.Add(req);
                else
                    await Task.Delay(5, ct);
            }

            if (batch.Count == 0) continue;

            var results = await _model.InferBatchAsync(
                batch.Select(r => r.Prompt).ToArray(), ct);

            for (int i = 0; i < batch.Count; i++)
                batch[i].Completion!.SetResult(results[i]);
        }
    }
}

Server node startup time. The Server Group controller will not route requests to a node until its health check passes. For nodes with long startup times (GPU model loading), configure the readiness probe with a generous initialDelaySeconds so the node is not routed to before it is ready.

← Server Group Mode Next: AI Inference Server Node →

Server Node as a Powerful Service

What a Server Node Actually Is

Self-Registration at Startup

Common Server Node Patterns

Warm-Cache Node

Stateful Session Node

GPU-Resident Inference Node

Queue-Draining Worker Node

Server Node vs Microservice

High-Throughput Design Patterns

Response Caching

Request Batching