Memory Configuration — Memory Overview

AgentMemoryConfig

Every AgentComposite carries an AgentMemoryConfig that controls all memory behaviour:

public class AgentMemoryConfig
{
    // ── Working Memory ──────────────────────────────────────────
    public int    MaxWorkingMemoryTokens { get; set; } = 100_000;
    public string PruningStrategy        { get; set; } = "FIFO";   // FIFO | Summarize | SlidingWindow
    public int    SlidingWindowTurns     { get; set; } = 10;
    public int    MinHistoryMessages     { get; set; } = 2;         // Never prune below this count

    // ── Episodic Memory ─────────────────────────────────────────
    public bool   EpisodicEnabled        { get; set; } = true;
    public string EpisodicRecallMode     { get; set; } = "Semantic"; // Recency | Semantic
    public int    EpisodicTopK           { get; set; } = 3;
    public int    EpisodicRetentionDays  { get; set; } = 90;
    public bool   EpisodicPIIDetection   { get; set; } = false;

    // ── Semantic Memory ─────────────────────────────────────────
    public bool   SemanticEnabled        { get; set; } = true;
    public int    SemanticTopK           { get; set; } = 5;
    public float  SemanticMinScore       { get; set; } = 0.75f;
    public bool   HybridSearchEnabled    { get; set; } = false;
    public bool   RerankerEnabled        { get; set; } = false;

    // ── Procedural Memory ───────────────────────────────────────
    public bool   ProceduralEnabled      { get; set; } = true;
    public bool   ProceduralLearning     { get; set; } = false;     // Agent captures new procedures
    public float  ProceduralMinScore     { get; set; } = 0.80f;     // Min confidence for embedding match
}

Configuration Properties Reference

Property	Default	Effect
`MaxWorkingMemoryTokens`	100,000	Maximum tokens in the assembled context. Set to ~80% of the model's context window.
`PruningStrategy`	FIFO	How message history is trimmed when over budget.
`EpisodicEnabled`	true	Whether past conversation sessions are recalled. Disable for stateless bots.
`EpisodicTopK`	3	Number of past episode snippets injected per turn. Higher = more context, more tokens.
`EpisodicRetentionDays`	90	How long episodes are kept before soft-delete and purge.
`SemanticEnabled`	true	Whether the knowledge base is queried. Disable if the agent has no indexed documents.
`SemanticTopK`	5	Number of knowledge chunks retrieved per turn. Tune for answer quality vs. token cost.
`SemanticMinScore`	0.75	Minimum cosine similarity to include a chunk. Lower = more results, more noise.
`HybridSearchEnabled`	false	Enable RRF hybrid (vector + keyword) retrieval. Better for precise term matching.
`RerankerEnabled`	false	Cross-encoder reranking of retrieved chunks. Improves precision at cost of latency.
`ProceduralEnabled`	true	Whether procedures are matched before each LLM call.
`ProceduralLearning`	false	Whether the agent captures new procedures from successful task completions.

Recommended Configurations by Agent Type

Agent Type	Episodic	Semantic	Procedural	Pruning
Knowledge Q&A bot (no memory)	Disabled	Enabled, TopK 5	Disabled	FIFO
Customer service (returning users)	Enabled, TopK 3	Enabled, TopK 5	Optional	FIFO
Task automation agent	Disabled	Optional	Enabled	SlidingWindow (10)
Full enterprise assistant	Enabled, TopK 3	Enabled, TopK 5, Hybrid	Enabled	Summarize
High-throughput batch agent	Disabled	Disabled	Disabled	FIFO (minimal history)

Configuring via API

// PATCH /api/octopus/agents/{agentId}/memory-config
PATCH /api/octopus/agents/agent_hr_01/memory-config
Authorization: Bearer {adminToken}
Content-Type: application/json

{
  "episodicEnabled":       true,
  "episodicTopK":          3,
  "episodicRetentionDays": 90,
  "semanticEnabled":       true,
  "semanticTopK":          5,
  "semanticMinScore":      0.75,
  "hybridSearchEnabled":   false,
  "proceduralEnabled":     true,
  "proceduralLearning":    false,
  "maxWorkingMemoryTokens": 100000,
  "pruningStrategy":       "FIFO"
}

// Response: 200 OK — agent uses new config on next turn

Tuning for Cost Reduction

Memory retrieval contributes tokens to every LLM call. To reduce token costs:

Reduce SemanticTopK from 5 to 3 — cuts 30–40% of injected knowledge tokens
Reduce EpisodicTopK from 3 to 1 — one past episode snippet is usually sufficient
Disable EpisodicEnabled if the agent does not need cross-session memory
Set PruningStrategy = SlidingWindow with a small window (6–8 turns) for task-focused sessions
Raise SemanticMinScore to 0.80–0.85 to inject only high-confidence knowledge chunks

Changes Take Effect Immediately

Memory configuration changes via the API or admin UI take effect on the next conversation turn — no server restart or session reset is required. Running conversations automatically pick up the new configuration.

← How Memory Types Interact