Portal Community

In-Process Only: By Design

Working memory contains the complete LLM context for the current turn. It is never written to disk, SQL, or a cache store. This is a deliberate architectural choice with several benefits:

In-Memory Conversation History

Between turns within the same HTTP session, the conversation message history is held in an in-memory store keyed by conversation ID. This is a ConversationHistoryCache held in a scoped DI lifetime:

// In-memory conversation history cache (per-request scope in ASP.NET Core)
public class ConversationHistoryCache
{
    // Key: ConversationId (Guid)
    // Value: ordered list of messages accumulated this session
    private readonly Dictionary<Guid, List<LLMMessage>> _history = new();

    public IReadOnlyList<LLMMessage> GetHistory(Guid conversationId)
        => _history.TryGetValue(conversationId, out var msgs)
            ? msgs.AsReadOnly()
            : Array.Empty<LLMMessage>();

    public void AppendTurn(Guid conversationId, LLMMessage userMsg, LLMMessage assistantMsg)
    {
        if (!_history.ContainsKey(conversationId))
            _history[conversationId] = new List<LLMMessage>();

        _history[conversationId].Add(userMsg);
        _history[conversationId].Add(assistantMsg);
    }
}

Session Persistence via EpisodicMessages

Although working memory itself is not persisted, each message in the current session is also written to Octopus_EpisodeMessages in SQL. This allows full reconstruction of the conversation if the server restarts mid-session:

// On every turn: write message to SQL for durability
await _episodicStore.AddMessageAsync(episode.EpisodeId, new EpisodeMessage
{
    Role       = "user",
    Content    = userMessage,
    TokenCount = _tokenCounter.Count(userMessage),
    CreatedAt  = DateTime.UtcNow
}, ct);

// If server restarts, reconstruct from SQL on next request:
var messages = await _episodicStore.GetSessionMessagesAsync(conversationId, ct);
// → Re-populate in-memory history from DB

Token Budget Management in RAM

The token budget calculation and pruning happen entirely in process, before the messages array is sent to the LLM provider:

public class WorkingMemoryManager
{
    public async Task<IReadOnlyList<LLMMessage>> BuildAsync(
        AgentComposite agent,
        ConversationComposite conversation,
        MemoryAssembly memoryAssembly,
        string userMessage,
        CancellationToken ct)
    {
        // 1. Assemble all messages (system + knowledge + history + current)
        var messages = Assemble(agent, conversation, memoryAssembly, userMessage);

        // 2. Count total tokens
        int totalTokens = _counter.CountMessages(messages);

        // 3. Prune if over budget (modifies only the in-memory list)
        if (totalTokens > agent.MemoryConfig.MaxWorkingMemoryTokens)
        {
            messages = await _pruner.PruneAsync(
                messages,
                agent.MemoryConfig.MaxWorkingMemoryTokens,
                ct);
        }

        return messages;  // Sent directly to LLM provider — never written to disk
    }
}

RAM Footprint Estimate

ComponentTypical SizeNotes
System prompt2–5 KBText only
Retrieved knowledge (5 chunks)5–20 KBDepends on chunk size
Episodic snippets (3 episodes)2–6 KBSummaries, not full messages
Message history (20 turns)20–100 KBWide variance based on message verbosity
Total working memory30–130 KB per active turnReleased when request completes
No Cache Warm-Up Needed

Because working memory is rebuilt on every turn, there is no cache warm-up period after a deployment or server restart. The first turn for any conversation is as fast as any subsequent turn.