LLM Routing via SK — Semantic Kernel Plugin

SKLLMProvider — The Bridge

The SKLLMProvider class implements Octopus's ILLMProvider interface, delegating to the SK kernel for completions. This means any Octopus agent can be configured to use SK as its LLM provider:

// SKLLMProvider — wraps SK kernel as an Octopus ILLMProvider
public class SKLLMProvider : ILLMProvider
{
    private readonly Kernel     _kernel;
    private readonly string     _serviceId;  // Which SK service to use (e.g., "gpt-4o")

    public SKLLMProvider(ISKKernelRegistry registry, string serviceId = "default")
    {
        _kernel    = registry.Get(serviceId);
        _serviceId = serviceId;
    }

    public async Task<LLMResponse> CompleteAsync(
        IReadOnlyList<LLMMessage> messages,
        IReadOnlyList<ToolDefinition>? tools,
        LLMOptions options,
        CancellationToken ct)
    {
        var chatHistory = BuildSKChatHistory(messages);
        var settings    = new OpenAIPromptExecutionSettings
        {
            MaxTokens   = options.MaxTokens,
            Temperature = options.Temperature,
            ToolCallBehavior = tools?.Any() == true
                ? ToolCallBehavior.AutoInvokeKernelFunctions
                : null
        };

        var result = await _kernel.InvokePromptAsync(
            chatHistory.ToString(),
            new KernelArguments(settings),
            cancellationToken: ct);

        return new LLMResponse
        {
            Content    = result.GetValue<string>() ?? string.Empty,
            ToolCalls  = ExtractToolCalls(result)
        };
    }

    public IAsyncEnumerable<string> StreamAsync(
        IReadOnlyList<LLMMessage> messages,
        LLMOptions options,
        CancellationToken ct)
    {
        return StreamSKTokensAsync(messages, options, ct);
    }
}

Multi-Model Routing

The SK kernel supports registering multiple AI services with distinct service IDs, enabling per-function model selection:

// OnStartAsync — register multiple models in the same kernel
kernelBuilder
    .AddAzureOpenAIChatCompletion(
        deploymentName: "gpt-4o",
        endpoint:       azureEndpoint,
        apiKey:         apiKey,
        serviceId:      "gpt-4o")           // Expensive, high-quality

    .AddAzureOpenAIChatCompletion(
        deploymentName: "gpt-4o-mini",
        endpoint:       azureEndpoint,
        apiKey:         apiKey,
        serviceId:      "gpt-4o-mini");     // Cheaper, faster

// In an agent's tool handler — choose which model to use for a specific step
var result = await kernel.InvokePromptAsync(
    prompt,
    new KernelArguments(new OpenAIPromptExecutionSettings
    {
        ServiceId = "gpt-4o-mini"            // Override for this specific call
    }));

Routing Strategy Table

Step Type	Recommended Model	Reason
Complex multi-step planning	`gpt-4o`	Requires strong reasoning and instruction following
Simple classification or intent detection	`gpt-4o-mini`	Low cost, fast response for structured tasks
RAG retrieval summarisation	`gpt-4o-mini`	Summarisation is cheaper with a smaller model
Code generation	`gpt-4o`	Code quality is significantly better with the full model
Embeddings	`text-embedding-3-large`	Separate service — routing does not apply

Configuring an Agent to Use SKLLMProvider

// Configure an agent via API to use the SK LLM provider
PATCH /api/agents/{agentId}
{
  "llmProvider": "SemanticKernel",
  "llmModel":    "gpt-4o",
  "llmSettings": {
    "serviceId": "gpt-4o",
    "plannerEnabled": true
  }
}

Fallback to Octopus native. If the SK kernel call fails (rate limit, service unavailable), the SKLLMProvider does not automatically fall back to the Octopus native LLM provider. Configure retry policies at the SK kernel level for resilience.

← Enabling the Plugin Next: SK Planner Integration →