Retrieval — RAG | BizFirstAI

Retrieval Flow

Embed Query User message embedded via IEmbeddingProvider.EmbedAsync. Must use same model as indexing.

Vector Search ISemanticMemoryStore.SearchAsync queries the agent's collection for the top-K nearest neighbours by cosine similarity.

Score Filter Results below MinScore (default 0.75) are discarded. Prevents low-relevance chunks from polluting context.

Return Top-K Up to SemanticTopK chunks (default 5) returned as SemanticRetrievalResult.

SemanticMemoryService.RetrieveAsync

public class SemanticMemoryService
{
    public async Task<SemanticRetrievalResult> RetrieveAsync(
        AgentComposite agent,
        string query,
        CancellationToken ct)
    {
        if (!agent.MemoryConfig.SemanticEnabled)
            return SemanticRetrievalResult.Empty;

        // 1. Embed the query
        float[] queryEmbedding = await _embedder.EmbedAsync(query, ct);

        // 2. Search the vector store
        string collection = $"agent_{agent.Id:N}";
        var results = await _store.SearchAsync(
            collection:  collection,
            embedding:   queryEmbedding,
            topK:        agent.MemoryConfig.SemanticTopK,
            minScore:    agent.MemoryConfig.SemanticMinScore,
            ct:          ct);

        return new SemanticRetrievalResult
        {
            Records      = results,
            QueryTokens  = _counter.Count(query),
            TotalTokens  = results.Sum(r => _counter.Count(r.Content))
        };
    }
}

public class SemanticRetrievalResult
{
    public static readonly SemanticRetrievalResult Empty = new();

    public IReadOnlyList<MemoryRecord> Records     { get; init; } = Array.Empty<MemoryRecord>();
    public int                         QueryTokens  { get; init; }
    public int                         TotalTokens  { get; init; }
}

Tuning Retrieval Parameters

Parameter	Default	Effect of Increasing	Effect of Decreasing
`SemanticTopK`	5	More context, more tokens, possible noise	Less context, fewer tokens, may miss relevant chunks
`SemanticMinScore`	0.75	Stricter filter — only high-confidence matches	More results — includes lower-relevance chunks

Category Filtering

Retrieval can be scoped to a specific document category when the query context is known:

// Filter to only retrieve from the "Leave" category
var results = await _store.SearchAsync(
    collection: collection,
    embedding:  queryEmbedding,
    topK:       5,
    minScore:   0.75f,
    filter:     new MemoryFilter { Category = "Leave" },
    ct:         ct);

// Qdrant payload filter equivalent:
// { "must": [{ "key": "category", "match": { "value": "Leave" } }] }

Debugging Retrieval Quality

Symptom	Likely Cause	Fix
Agent can't answer a known question	Chunk not retrieved — score below MinScore	Lower MinScore; check chunk quality; re-index with better chunking
Agent gives irrelevant answers	Wrong chunks retrieved — MinScore too low	Raise MinScore to 0.80–0.85
Agent only uses part of a long document	TopK too low — relevant chunk not in top-5	Increase SemanticTopK to 8–10
High token cost per turn	TopK too high or chunks too large	Reduce TopK; use smaller chunk size (256 tokens)

← Indexing Next: Hybrid Search →