Portal Community

What Is Working Memory?

Every call to an LLM is stateless. The model has no persistent memory between calls — it only knows what appears in the messages array you send it. Working memory is the name Octopus gives to the assembled messages array that is constructed before each LLM call:

// Working memory is the messages[] sent to the LLM on every turn:
[
  { "role": "system",    "content": "You are Aria, the HR specialist..." },
  { "role": "user",      "content": "[Retrieved Knowledge]\nSource: HR Policy..." },
  { "role": "user",      "content": "Hi, how many leave days do I have?" },
  { "role": "assistant", "content": "You have 12 annual leave days remaining." },
  { "role": "user",      "content": "Can I take 5 days in June?" }
]

Key Characteristics

PropertyValueImplication
Storage locationIn-process RAM onlyNothing persists — lost when the request completes
ScopeSingle conversation turnRebuilt from scratch on every LLM call
Size limitModel context window (e.g. 200K tokens)Must stay within budget; pruning kicks in when exceeded
Content sourcesSystem prompt + injected knowledge + message history + current messageAll four memory types contribute to working memory
Managed byWorkingMemoryManagerAssembles, injects, prunes, and validates context

What Goes Into Working Memory

The WorkingMemoryManager assembles working memory from five sources, in this order:

1
System Prompt Agent persona, goals, instructions. Always first — highest model attention position.
2
Matched Procedure If procedural memory matched a skill, its step-by-step instructions appear here.
3
Retrieved Knowledge Semantic memory chunks retrieved by vector similarity search on the current message.
4
Episodic Snippets Past conversation summaries recalled from SQL — cross-session user context.
5
Message History + Current Message Current session conversation turns, pruned to fit the token budget.

Token Budget and Pruning

Every agent has a MaxWorkingMemoryTokens budget. When the assembled context exceeds the budget, the pruner removes the oldest message history turns until it fits. Three pruning strategies are available:

StrategyHow It WorksCost
FIFODrops the oldest turns firstZero
SummarizeCondenses old turns via a secondary LLM call before dropping themExtra LLM call
SlidingWindowAlways keeps only the last N turn pairsZero

Why Working Memory Is Transient

Working memory intentionally does not persist between turns. This design means:

Full Guide

This is a summary page. The Working Memory full guide covers context window mechanics, token budgeting, all three pruning strategies with code, knowledge injection position, tool call history management, and the Context Inspector debugging panel.