Portal Community

The Read Phase: Pre-LLM Retrieval

Before every LLM call, three of the four memory types are queried in parallel. Working memory is always built; it receives the results from the others:

1
User Message Arrives The chat-app sends the user's message to the Octopus conversation API endpoint.
2
MemoryOrchestrator.AssembleAsync (parallel) Three retrievals run simultaneously: episodic search (SQL), semantic search (vector DB), procedural match (regex + embedding). No dependency between them — Task.WhenAll.
3
WorkingMemoryManager.BuildAsync Receives the MemoryAssembly result. Assembles system prompt + procedure + knowledge + episodes + history + current message into the LLM messages array.
4
Token Budget Check + Pruning If the assembled context exceeds MaxWorkingMemoryTokens, the pruner removes the oldest message history turns until within budget.
5
LLM Call The assembled messages array is sent to the configured LLM provider. Streaming response flows back to the user.

The Write Phase: Post-Response Storage

After the LLM response is received, two write-back operations may occur:

TriggerMemory WrittenWhat Is Stored
Every turnWorking memory (next turn)The user message + assistant response are appended to the in-memory conversation history for the next turn's context
Session endEpisodic memory (SQL)Conversation summary + key facts + embedding written as an episode row
Multi-step task completion (optional)Procedural memory (SQL, pending)If ProcedureCaptureService detects a completed multi-step task, it stores a new procedure in pending-approval state
Document ingestion (manual)Semantic memory (vector DB)Indexed chunks added to the agent's vector collection

Context Assembly Order

All four memory types contribute to working memory, assembled in this fixed order (highest-to-lowest model attention):

┌─────────────────────────────────────────────────┐
│ 1. System Prompt                                │  ← AgentComposite.SystemPrompt
├─────────────────────────────────────────────────┤
│ 2. Matched Procedure                            │  ← ProceduralMemory result
│    Step 1: Ask for vendor name...               │
│    Step 2: Call vendor_lookup...                │
├─────────────────────────────────────────────────┤
│ 3. Retrieved Knowledge                          │  ← SemanticMemory result
│    Source: HR Policy 2025.pdf                   │
│    Content: "Annual leave entitlement is..."    │
├─────────────────────────────────────────────────┤
│ 4. Past Conversation Snippets                   │  ← EpisodicMemory result
│    2025-03-14: User asked about leave balance   │
├─────────────────────────────────────────────────┤
│ 5. Message History (pruned to budget)           │  ← WorkingMemory (turns 1..n-1)
│    User: Hi, how many days do I have?           │
│    Aria: You have 12 remaining.                 │
├─────────────────────────────────────────────────┤
│ 6. Current User Message                         │  ← Highest model attention
│    User: Can I take 5 days in June?             │
└─────────────────────────────────────────────────┘

Priority Conflicts: When Memory Types Contradict

Because all memory types contribute to the same context window, conflicts can arise. The LLM resolves conflicts according to attention position — later content overrides earlier content for instruction-following. Design guidelines to avoid conflicts:

Conflict ScenarioResultRecommended Fix
Procedure says "Step 1: ask for name" but semantic knowledge says "skip if internal user"LLM may be confusedAdd conditional logic in the procedure step definition
Episodic snippet says user has 12 days remaining; database now shows 8LLM may quote stale dataEnsure live data queries (tools) override injected context — instruct in system prompt
Too many memory sources injected; message history pruned severelyAgent loses short-term contextReduce SemanticTopK and EpisodicTopK to preserve more message history budget
Monitoring Memory Interactions

Use the Context Inspector (available to the OctopusDebug role) to inspect exactly what each memory type contributed to a specific turn. This is the fastest way to diagnose unexpected agent responses caused by conflicting memory content.