Retrieval at Query Time — Semantic Memory

The Retrieval Pipeline

public class SemanticMemoryService : ISemanticMemoryService
{
    public async Task<SemanticRetrievalResult> RetrieveAsync(
        AgentComposite agent,
        string query,
        CancellationToken ct = default)
    {
        if (!agent.MemoryConfig.SemanticEnabled)
            return SemanticRetrievalResult.Empty;

        // 1. Embed the query
        var queryEmbedding = await _embeddingProvider.EmbedAsync(query, ct);

        // 2. Search the agent's vector collection
        var collection = GetCollectionName(agent);
        var results = await _store.SearchAsync(
            collection,
            queryEmbedding,
            topK: agent.MemoryConfig.SemanticTopK,
            minScore: agent.MemoryConfig.SemanticMinScore,
            filter: new MemoryFilter { TenantId = agent.TenantId },
            ct: ct);

        // 3. Format for context injection
        return new SemanticRetrievalResult
        {
            Chunks = results,
            FormattedContext = FormatForContext(results)
        };
    }

    private string FormatForContext(IReadOnlyList<MemoryRecord> records)
    {
        var sb = new StringBuilder("[Retrieved Knowledge]\n");
        foreach (var r in records)
        {
            sb.AppendLine($"Source: {r.Metadata.Source}");
            sb.AppendLine(r.Content);
            sb.AppendLine("---");
        }
        return sb.ToString();
    }
}

Retrieval Configuration

Config Property	Default	Effect
`SemanticTopK`	5	Number of chunks to retrieve per query
`SemanticMinScore`	0.7	Minimum cosine similarity threshold (0–1)
`SemanticContextMaxTokens`	2000	Max tokens allocated to retrieved knowledge in context
`SemanticEnabled`	true	Disable to skip retrieval entirely for this agent

Context Injection Position

Retrieved knowledge is injected between the system prompt and the conversation history in the LLM context:

// LLM context assembly order:
[1] System Prompt        ← agent.SystemPrompt
[2] Retrieved Knowledge  ← semantic retrieval results (this page)
[3] Episode Snippets     ← episodic memory recall
[4] Message History      ← current session messages (pruned to budget)
[5] Current User Message ← the user's latest input

// The LLM reads [2] Retrieved Knowledge as authoritative information
// to ground its response — reducing hallucination

Tuning MinScore

A MinScore of 0.7 is conservative — you may get fewer but more relevant chunks. If your agent frequently responds "I don't have information about that" for queries that should match, lower MinScore to 0.6. If it returns irrelevant information, raise to 0.75 or above.

← Knowledge Indexing Next: Hybrid Search →