Tool Call History in Context — Working Memory

Tool Call Message Sequence

When the LLM produces a tool call, the message history must contain the tool call + result as a consecutive pair before the LLM can continue:

// Required message sequence for tool calls (Anthropic format):
[
  { "role": "user", "content": "Onboard TechCorp as a vendor" },
  {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      {
        "id": "tc_01",
        "name": "vendor_lookup",
        "input": { "name": "TechCorp" }
      }
    ]
  },
  {
    "role": "tool",
    "tool_call_id": "tc_01",
    "content": "{\"result\": null, \"message\": \"Vendor not found\"}"
  },
  {
    // LLM continues here with the tool result in context:
    "role": "assistant",
    "content": "TechCorp is not yet in the system. I'll create it now..."
  }
]

Pruning Tool Call Pairs

Tool call + result messages are always pruned as a pair — never individually. Removing only one half would break the message sequence and cause LLM API errors:

// FIFOPruner handles tool call pairs
private IEnumerable<MessageGroup> GroupIntoRemovableUnits(IReadOnlyList<LLMMessage> messages)
{
    // Each group is a removable unit:
    // - Single user message (no tool calls)
    // - Assistant message with tool calls + all their tool result messages
    // - Single assistant message (no tool calls)
    var groups = new List<MessageGroup>();
    // ... grouping logic ...
    return groups;
    // Pruner removes entire groups — never individual messages within a group
}

Tool Call Results Token Impact

Tool results can be very large — a database query might return thousands of tokens of JSON. The ToolResultTruncator limits tool result sizes before they enter the context:

// Truncate tool results to fit budget
public class ToolResultTruncator
{
    public string Truncate(string toolResult, int maxTokens = 2000)
    {
        int tokens = _counter.Count(toolResult);
        if (tokens <= maxTokens) return toolResult;

        // Truncate and add notice
        int keepChars = maxTokens * 4;  // approximate char count
        return toolResult[..keepChars] +
            $"\n[... truncated {tokens - maxTokens} tokens ...]";
    }
}

Tool Results and Token Cost

Large tool results are a common source of unexpected token cost inflation. Always set appropriate size limits on MCP tool outputs. A tool that returns a 10,000-token JSON blob on every call will rapidly exhaust the context budget. Design tools to return only the data the LLM needs — not full database records.

← Knowledge Injection Next: Context Inspector →