Working Memory Overview — Working Memory

What Goes Into Working Memory

Working memory is assembled by the WorkingMemoryManager before every LLM call. It contains, in priority order:

Component	Source	Priority	Typical Size
System Prompt	AgentComposite.SystemPrompt	1 (highest)	500–3000 tokens
Procedure	Procedural memory (if matched)	2	100–500 tokens
Retrieved Knowledge	Semantic memory retrieval	3	500–2000 tokens
Episode Snippets	Episodic memory recall	4	100–500 tokens
Message History	Current session messages (pruned)	5	Remainder of budget
Current Message	User's latest input	6 (last added)	20–500 tokens

Token Budget

The total context window size is constrained by the LLM's maximum context length. The MaxWorkingMemoryTokens setting defines the usable portion (leaving headroom for the output):

// Token budget allocation
int totalBudget = agent.MemoryConfig.MaxWorkingMemoryTokens; // e.g., 150,000

int systemPromptTokens  = CountTokens(agent.SystemPrompt);   // ~1,200
int procedureTokens     = matchedProcedure != null ? CountTokens(procedure) : 0; // ~300
int knowledgeTokens     = CountTokens(retrievedChunks);      // ~1,500
int episodeTokens       = CountTokens(episodeSnippets);       // ~400
int reservedForCurrent  = 500;  // current message + tool call results

int historyBudget = totalBudget
    - systemPromptTokens - procedureTokens
    - knowledgeTokens - episodeTokens
    - reservedForCurrent;
// Remaining budget allocated to message history

WorkingMemoryManager

public class WorkingMemoryManager
{
    public async Task<WorkingMemoryContext> BuildAsync(
        AgentComposite agent,
        ConversationComposite conversation,
        string currentUserMessage,
        CancellationToken ct = default)
    {
        var messages = new List<LLMMessage>();

        // 1. System prompt (always first)
        messages.Add(new LLMMessage(Role.System, agent.SystemPrompt));

        // 2. Procedural memory (if task matches)
        var procedure = await _proceduralStore
            .FindMatchAsync(currentUserMessage, agent.Id, agent.TenantId, ct);
        if (procedure != null)
            messages.Add(new LLMMessage(Role.System,
                FormatProcedure(procedure)));

        // 3. Semantic memory retrieval
        var retrieved = await _semanticService
            .RetrieveAsync(agent, currentUserMessage, ct);
        if (retrieved.HasContent)
            messages.Add(new LLMMessage(Role.Context,
                retrieved.FormattedContext));

        // 4. Episodic memory snippets
        var episodes = await _episodicStore
            .SearchAsync(agent.Id, conversation.UserId, currentUserMessage, topK: 3, ct);
        if (episodes.Any())
            messages.Add(new LLMMessage(Role.Context,
                FormatEpisodes(episodes)));

        // 5. Message history (pruned to remaining budget)
        var historyBudget = CalculateHistoryBudget(agent, messages);
        var history = await _pruner.PruneAsync(
            conversation.Messages, historyBudget, agent.MemoryConfig.PruningStrategy);
        messages.AddRange(history);

        // 6. Current user message
        messages.Add(new LLMMessage(Role.User, currentUserMessage));

        return new WorkingMemoryContext { Messages = messages };
    }
}

Why Working Memory is Not Persisted

Working memory is intentionally ephemeral. The LLM context window is rebuilt fresh on every turn from the persistent stores (episodic, semantic, procedural). This design means:

Crashes don't lose working memory — it's reconstructed from the durable stores
No in-memory state to synchronize across horizontally-scaled API servers
The composition is dynamic — if new knowledge is indexed between turns, it appears in the next turn's retrieval

Next: Context Window →