Portal Community

What Goes Into Working Memory

Working memory is assembled by the WorkingMemoryManager before every LLM call. It contains, in priority order:

ComponentSourcePriorityTypical Size
System PromptAgentComposite.SystemPrompt1 (highest)500–3000 tokens
ProcedureProcedural memory (if matched)2100–500 tokens
Retrieved KnowledgeSemantic memory retrieval3500–2000 tokens
Episode SnippetsEpisodic memory recall4100–500 tokens
Message HistoryCurrent session messages (pruned)5Remainder of budget
Current MessageUser's latest input6 (last added)20–500 tokens

Token Budget

The total context window size is constrained by the LLM's maximum context length. The MaxWorkingMemoryTokens setting defines the usable portion (leaving headroom for the output):

// Token budget allocation
int totalBudget = agent.MemoryConfig.MaxWorkingMemoryTokens; // e.g., 150,000

int systemPromptTokens  = CountTokens(agent.SystemPrompt);   // ~1,200
int procedureTokens     = matchedProcedure != null ? CountTokens(procedure) : 0; // ~300
int knowledgeTokens     = CountTokens(retrievedChunks);      // ~1,500
int episodeTokens       = CountTokens(episodeSnippets);       // ~400
int reservedForCurrent  = 500;  // current message + tool call results

int historyBudget = totalBudget
    - systemPromptTokens - procedureTokens
    - knowledgeTokens - episodeTokens
    - reservedForCurrent;
// Remaining budget allocated to message history

WorkingMemoryManager

public class WorkingMemoryManager
{
    public async Task<WorkingMemoryContext> BuildAsync(
        AgentComposite agent,
        ConversationComposite conversation,
        string currentUserMessage,
        CancellationToken ct = default)
    {
        var messages = new List<LLMMessage>();

        // 1. System prompt (always first)
        messages.Add(new LLMMessage(Role.System, agent.SystemPrompt));

        // 2. Procedural memory (if task matches)
        var procedure = await _proceduralStore
            .FindMatchAsync(currentUserMessage, agent.Id, agent.TenantId, ct);
        if (procedure != null)
            messages.Add(new LLMMessage(Role.System,
                FormatProcedure(procedure)));

        // 3. Semantic memory retrieval
        var retrieved = await _semanticService
            .RetrieveAsync(agent, currentUserMessage, ct);
        if (retrieved.HasContent)
            messages.Add(new LLMMessage(Role.Context,
                retrieved.FormattedContext));

        // 4. Episodic memory snippets
        var episodes = await _episodicStore
            .SearchAsync(agent.Id, conversation.UserId, currentUserMessage, topK: 3, ct);
        if (episodes.Any())
            messages.Add(new LLMMessage(Role.Context,
                FormatEpisodes(episodes)));

        // 5. Message history (pruned to remaining budget)
        var historyBudget = CalculateHistoryBudget(agent, messages);
        var history = await _pruner.PruneAsync(
            conversation.Messages, historyBudget, agent.MemoryConfig.PruningStrategy);
        messages.AddRange(history);

        // 6. Current user message
        messages.Add(new LLMMessage(Role.User, currentUserMessage));

        return new WorkingMemoryContext { Messages = messages };
    }
}

Why Working Memory is Not Persisted

Working memory is intentionally ephemeral. The LLM context window is rebuilt fresh on every turn from the persistent stores (episodic, semantic, procedural). This design means: