Octopus
Working Memory
Working memory is the content of the LLM context window at the moment of generation. It is the agent's "consciousness" — everything it is aware of right now. Unlike the other memory types, working memory is transient: it exists only within a single LLM call.
What Goes Into Working Memory
Working memory is assembled by the WorkingMemoryManager before every LLM call. It contains, in priority order:
| Component | Source | Priority | Typical Size |
|---|---|---|---|
| System Prompt | AgentComposite.SystemPrompt | 1 (highest) | 500–3000 tokens |
| Procedure | Procedural memory (if matched) | 2 | 100–500 tokens |
| Retrieved Knowledge | Semantic memory retrieval | 3 | 500–2000 tokens |
| Episode Snippets | Episodic memory recall | 4 | 100–500 tokens |
| Message History | Current session messages (pruned) | 5 | Remainder of budget |
| Current Message | User's latest input | 6 (last added) | 20–500 tokens |
Token Budget
The total context window size is constrained by the LLM's maximum context length. The MaxWorkingMemoryTokens setting defines the usable portion (leaving headroom for the output):
// Token budget allocation
int totalBudget = agent.MemoryConfig.MaxWorkingMemoryTokens; // e.g., 150,000
int systemPromptTokens = CountTokens(agent.SystemPrompt); // ~1,200
int procedureTokens = matchedProcedure != null ? CountTokens(procedure) : 0; // ~300
int knowledgeTokens = CountTokens(retrievedChunks); // ~1,500
int episodeTokens = CountTokens(episodeSnippets); // ~400
int reservedForCurrent = 500; // current message + tool call results
int historyBudget = totalBudget
- systemPromptTokens - procedureTokens
- knowledgeTokens - episodeTokens
- reservedForCurrent;
// Remaining budget allocated to message history
WorkingMemoryManager
public class WorkingMemoryManager
{
public async Task<WorkingMemoryContext> BuildAsync(
AgentComposite agent,
ConversationComposite conversation,
string currentUserMessage,
CancellationToken ct = default)
{
var messages = new List<LLMMessage>();
// 1. System prompt (always first)
messages.Add(new LLMMessage(Role.System, agent.SystemPrompt));
// 2. Procedural memory (if task matches)
var procedure = await _proceduralStore
.FindMatchAsync(currentUserMessage, agent.Id, agent.TenantId, ct);
if (procedure != null)
messages.Add(new LLMMessage(Role.System,
FormatProcedure(procedure)));
// 3. Semantic memory retrieval
var retrieved = await _semanticService
.RetrieveAsync(agent, currentUserMessage, ct);
if (retrieved.HasContent)
messages.Add(new LLMMessage(Role.Context,
retrieved.FormattedContext));
// 4. Episodic memory snippets
var episodes = await _episodicStore
.SearchAsync(agent.Id, conversation.UserId, currentUserMessage, topK: 3, ct);
if (episodes.Any())
messages.Add(new LLMMessage(Role.Context,
FormatEpisodes(episodes)));
// 5. Message history (pruned to remaining budget)
var historyBudget = CalculateHistoryBudget(agent, messages);
var history = await _pruner.PruneAsync(
conversation.Messages, historyBudget, agent.MemoryConfig.PruningStrategy);
messages.AddRange(history);
// 6. Current user message
messages.Add(new LLMMessage(Role.User, currentUserMessage));
return new WorkingMemoryContext { Messages = messages };
}
}
Why Working Memory is Not Persisted
Working memory is intentionally ephemeral. The LLM context window is rebuilt fresh on every turn from the persistent stores (episodic, semantic, procedural). This design means:
- Crashes don't lose working memory — it's reconstructed from the durable stores
- No in-memory state to synchronize across horizontally-scaled API servers
- The composition is dynamic — if new knowledge is indexed between turns, it appears in the next turn's retrieval