Context Injection — RAG

Injection Position

Retrieved knowledge is injected between the system prompt and the message history. This position ensures the LLM sees the knowledge before it processes the current conversation — without overriding the system prompt's instructions:

┌─────────────────────────────────────┐
│ [System Prompt]                     │  ← Instructions, persona, constraints
├─────────────────────────────────────┤
│ [Procedure] (if matched)            │
├─────────────────────────────────────┤
│ [Retrieved Knowledge]               │  ← Injected here — semantic memory
│   Source: HR Policy 2025.pdf        │
│   Annual leave entitlement is 20... │
│   ---                               │
│   Source: Leave Calculator Guide.pdf│
│   To calculate your remaining...    │
├─────────────────────────────────────┤
│ [Past Conversation Snippets]        │  ← Episodic memory
├─────────────────────────────────────┤
│ [Message History]                   │
├─────────────────────────────────────┤
│ [Current User Message]              │  ← Highest model attention position
└─────────────────────────────────────┘

Knowledge Block Format

private string FormatForContext(SemanticRetrievalResult result)
{
    if (!result.Records.Any()) return string.Empty;

    var sb = new StringBuilder("[Retrieved Knowledge]\n");
    sb.AppendLine("The following information is relevant to answering the user's question.");
    sb.AppendLine("Use this information in your response. Cite sources when appropriate.");
    sb.AppendLine();

    foreach (var record in result.Records)
    {
        sb.AppendLine($"Source: {record.Metadata.Source}");
        if (!string.IsNullOrEmpty(record.Metadata.Category))
            sb.AppendLine($"Category: {record.Metadata.Category}");
        sb.AppendLine($"Relevance Score: {record.Score:F2}");
        sb.AppendLine(record.Content);
        sb.AppendLine("---");
    }

    sb.AppendLine("If the answer is not in the above information, say so honestly.");
    return sb.ToString();
}

Citation Instructions

Adding source citation instructions to the system prompt improves transparency and helps users verify answers:

// System prompt addition for automatic citations:
string citationInstruction = """
When you use information from the [Retrieved Knowledge] section to answer a question,
cite the source document at the end of your answer using this format:
(Source: <filename>)
If your answer combines information from multiple sources, cite all of them.
If you cannot find the answer in the retrieved knowledge, say:
"I don't have specific information on that in my knowledge base."
Do not invent information not present in the retrieved knowledge.
""";

Handling Empty Retrieval

When no chunks meet the MinScore threshold, the knowledge block is omitted from context. The system prompt should instruct the agent how to respond in this case:

// Check before injecting
if (result.Records.Any())
{
    messages.Add(new LLMMessage(Role.System, FormatForContext(result)));
}
// If no knowledge retrieved, the LLM uses only its system prompt and conversation history.
// System prompt should say:
// "If you don't have retrieved knowledge to answer a factual question, say:
//  'I don't have that information in my knowledge base.'"

Token Cost of Injected Knowledge

Scenario	Approx. Tokens Injected	Notes
5 chunks × 512 tokens each	~2,600 tokens (with headers)	Default configuration
3 chunks × 256 tokens each	~820 tokens	Lower cost, smaller context window use
0 chunks (no match)	0 tokens	No injection — MinScore filter removed all candidates

Knowledge Is Refreshed Every Turn

Semantic retrieval runs on every turn. If new documents are indexed mid-conversation, they will appear in retrieved knowledge on the next turn without any session restart. This makes the RAG pipeline immediately responsive to knowledge base updates.

← Reranking