History Management
Message history management controls how the growing list of conversation messages is handled as the session gets longer. The goal is to keep the most relevant messages within the token budget while preserving enough context for coherent responses.
History Growth Problem
In a long conversation, the message history grows unboundedly. After 50 turns, a typical conversation might be 20,000+ tokens in history alone. With a token budget of 150,000 tokens, there is plenty of room — but with a 8,000-token budget (Ollama local model), the history must be aggressively managed.
The ContextPruner is called when the assembled context exceeds the available history budget:
public abstract class ContextPruner
{
public abstract Task<IReadOnlyList<LLMMessage>> PruneAsync(
IReadOnlyList<LLMMessage> history,
int targetTokens,
CancellationToken ct = default);
}
// The three built-in pruning strategies:
// 1. FIFOPruner — drop oldest messages
// 2. SummarizePruner — compress old messages via LLM
// 3. SlidingWindowPruner — keep only the last N turns
Minimum History Guarantee
Regardless of budget pressure, Octopus always includes at least the last MinHistoryTurns complete user-assistant pairs in context (default: 3 turns). This prevents the LLM from losing track of the very recent conversation even in extreme budget scenarios:
private IReadOnlyList<LLMMessage> EnsureMinHistory(
IReadOnlyList<LLMMessage> pruned,
IReadOnlyList<LLMMessage> original)
{
int minTurns = _config.MinHistoryTurns; // default 3
var lastTurns = GetLastNTurns(original, minTurns);
// If pruned result has fewer than min turns, add them back
// (may slightly exceed budget — min history is non-negotiable)
foreach (var msg in lastTurns)
{
if (!pruned.Any(m => m.Id == msg.Id))
pruned = new[] { msg }.Concat(pruned).ToList();
}
return pruned;
}
History per Turn Type
Not all history items are equally important. The pruner gives different priority to different message types:
| Message Type | Pruning Priority | Notes |
|---|---|---|
| System (injected context) | Never pruned | Knowledge and episodes are rebuilt per turn anyway |
| User messages | FIFO (oldest first) | Oldest user questions dropped first |
| Assistant responses | FIFO (oldest first) | Paired with user messages — removed together |
| Tool call + result | FIFO (oldest first) | Removed as a pair |