Context Injection
Context injection is the final step of the RAG pipeline — taking retrieved knowledge chunks and formatting them as a message that is inserted into the LLM's working memory context before the LLM call. The format and position of injected knowledge significantly affects how reliably the LLM uses it.
Injection Position
Retrieved knowledge is injected between the system prompt and the message history. This position ensures the LLM sees the knowledge before it processes the current conversation — without overriding the system prompt's instructions:
┌─────────────────────────────────────┐
│ [System Prompt] │ ← Instructions, persona, constraints
├─────────────────────────────────────┤
│ [Procedure] (if matched) │
├─────────────────────────────────────┤
│ [Retrieved Knowledge] │ ← Injected here — semantic memory
│ Source: HR Policy 2025.pdf │
│ Annual leave entitlement is 20... │
│ --- │
│ Source: Leave Calculator Guide.pdf│
│ To calculate your remaining... │
├─────────────────────────────────────┤
│ [Past Conversation Snippets] │ ← Episodic memory
├─────────────────────────────────────┤
│ [Message History] │
├─────────────────────────────────────┤
│ [Current User Message] │ ← Highest model attention position
└─────────────────────────────────────┘
Knowledge Block Format
private string FormatForContext(SemanticRetrievalResult result)
{
if (!result.Records.Any()) return string.Empty;
var sb = new StringBuilder("[Retrieved Knowledge]\n");
sb.AppendLine("The following information is relevant to answering the user's question.");
sb.AppendLine("Use this information in your response. Cite sources when appropriate.");
sb.AppendLine();
foreach (var record in result.Records)
{
sb.AppendLine($"Source: {record.Metadata.Source}");
if (!string.IsNullOrEmpty(record.Metadata.Category))
sb.AppendLine($"Category: {record.Metadata.Category}");
sb.AppendLine($"Relevance Score: {record.Score:F2}");
sb.AppendLine(record.Content);
sb.AppendLine("---");
}
sb.AppendLine("If the answer is not in the above information, say so honestly.");
return sb.ToString();
}
Citation Instructions
Adding source citation instructions to the system prompt improves transparency and helps users verify answers:
// System prompt addition for automatic citations:
string citationInstruction = """
When you use information from the [Retrieved Knowledge] section to answer a question,
cite the source document at the end of your answer using this format:
(Source: <filename>)
If your answer combines information from multiple sources, cite all of them.
If you cannot find the answer in the retrieved knowledge, say:
"I don't have specific information on that in my knowledge base."
Do not invent information not present in the retrieved knowledge.
""";
Handling Empty Retrieval
When no chunks meet the MinScore threshold, the knowledge block is omitted from context. The system prompt should instruct the agent how to respond in this case:
// Check before injecting
if (result.Records.Any())
{
messages.Add(new LLMMessage(Role.System, FormatForContext(result)));
}
// If no knowledge retrieved, the LLM uses only its system prompt and conversation history.
// System prompt should say:
// "If you don't have retrieved knowledge to answer a factual question, say:
// 'I don't have that information in my knowledge base.'"
Token Cost of Injected Knowledge
| Scenario | Approx. Tokens Injected | Notes |
|---|---|---|
| 5 chunks × 512 tokens each | ~2,600 tokens (with headers) | Default configuration |
| 3 chunks × 256 tokens each | ~820 tokens | Lower cost, smaller context window use |
| 0 chunks (no match) | 0 tokens | No injection — MinScore filter removed all candidates |
Semantic retrieval runs on every turn. If new documents are indexed mid-conversation, they will appear in retrieved knowledge on the next turn without any session restart. This makes the RAG pipeline immediately responsive to knowledge base updates.