Token Budget — Working Memory

Budget Formula

// Token budget calculation
HistoryBudget = MaxWorkingMemoryTokens
              - SystemPromptTokens
              - ProcedureTokens       (if matched)
              - RetrievedKnowledgeTokens
              - EpisodeSnippetTokens
              - CurrentMessageTokens
              - OutputReserve         // tokens reserved for agent response

// Example with Claude and a large knowledge base:
// MaxWorkingMemoryTokens:   150,000
// SystemPromptTokens:        -1,200
// ProcedureTokens:             -300
// RetrievedKnowledgeTokens:  -1,500
// EpisodeSnippetTokens:        -400
// CurrentMessageTokens:        -100
// OutputReserve:             -8,192
// ─────────────────────────────────
// HistoryBudget:            138,308  tokens available for message history

Token Counting Approaches

Approach	Accuracy	Cost	Use When
Approximate (character-based)	±15%	Near-zero	Default — fast enough for most agents
Provider tokenizer (tiktoken, etc.)	Exact	Low (local)	Agents with very tight context budgets
Provider API count endpoint	Exact	API call	Expensive — avoid in hot path

// Default approximate token counter
public class ApproximateTokenCounter : ITokenCounter
{
    // Rough approximation: ~4 characters per token for English text
    public int Count(string text) => Math.Max(1, text.Length / 4);

    public int CountMessages(IReadOnlyList<LLMMessage> messages) =>
        messages.Sum(m => Count(m.Content) + 4);  // +4 for role overhead
}

Budget Configuration

// MemoryConfig budget settings
public class MemoryConfig
{
    // Context window settings
    public int MaxWorkingMemoryTokens { get; set; } = 100_000;
    public int OutputReserve { get; set; } = 4_096;      // reserved for response
    public int MinHistoryTokens { get; set; } = 4_000;   // always keep at least this

    // Section limits
    public int MaxKnowledgeTokens { get; set; } = 3_000;  // cap semantic memory size
    public int MaxEpisodeTokens { get; set; } = 1_000;    // cap episodic snippet size
}

When the Budget is Exhausted

If all sections combined exceed MaxWorkingMemoryTokens, the pruning strategy kicks in to reduce message history until the budget is satisfied. The system prompt, retrieved knowledge, and current message are never pruned — only message history is eligible for reduction.

← Context Window Next: History Management →