Octopus
Token Budget
The token budget is the allocation of the available context window across the different sections of working memory. Managing the budget correctly ensures message history is preserved while keeping mandatory content (system prompt, retrieved knowledge) intact.
Budget Formula
// Token budget calculation
HistoryBudget = MaxWorkingMemoryTokens
- SystemPromptTokens
- ProcedureTokens (if matched)
- RetrievedKnowledgeTokens
- EpisodeSnippetTokens
- CurrentMessageTokens
- OutputReserve // tokens reserved for agent response
// Example with Claude and a large knowledge base:
// MaxWorkingMemoryTokens: 150,000
// SystemPromptTokens: -1,200
// ProcedureTokens: -300
// RetrievedKnowledgeTokens: -1,500
// EpisodeSnippetTokens: -400
// CurrentMessageTokens: -100
// OutputReserve: -8,192
// ─────────────────────────────────
// HistoryBudget: 138,308 tokens available for message history
Token Counting Approaches
| Approach | Accuracy | Cost | Use When |
|---|---|---|---|
| Approximate (character-based) | ±15% | Near-zero | Default — fast enough for most agents |
| Provider tokenizer (tiktoken, etc.) | Exact | Low (local) | Agents with very tight context budgets |
| Provider API count endpoint | Exact | API call | Expensive — avoid in hot path |
// Default approximate token counter
public class ApproximateTokenCounter : ITokenCounter
{
// Rough approximation: ~4 characters per token for English text
public int Count(string text) => Math.Max(1, text.Length / 4);
public int CountMessages(IReadOnlyList<LLMMessage> messages) =>
messages.Sum(m => Count(m.Content) + 4); // +4 for role overhead
}
Budget Configuration
// MemoryConfig budget settings
public class MemoryConfig
{
// Context window settings
public int MaxWorkingMemoryTokens { get; set; } = 100_000;
public int OutputReserve { get; set; } = 4_096; // reserved for response
public int MinHistoryTokens { get; set; } = 4_000; // always keep at least this
// Section limits
public int MaxKnowledgeTokens { get; set; } = 3_000; // cap semantic memory size
public int MaxEpisodeTokens { get; set; } = 1_000; // cap episodic snippet size
}
When the Budget is Exhausted
If all sections combined exceed MaxWorkingMemoryTokens, the pruning strategy kicks in to reduce message history until the budget is satisfied. The system prompt, retrieved knowledge, and current message are never pruned — only message history is eligible for reduction.