Portal Community

The Retrieval Pipeline

public class SemanticMemoryService : ISemanticMemoryService
{
    public async Task<SemanticRetrievalResult> RetrieveAsync(
        AgentComposite agent,
        string query,
        CancellationToken ct = default)
    {
        if (!agent.MemoryConfig.SemanticEnabled)
            return SemanticRetrievalResult.Empty;

        // 1. Embed the query
        var queryEmbedding = await _embeddingProvider.EmbedAsync(query, ct);

        // 2. Search the agent's vector collection
        var collection = GetCollectionName(agent);
        var results = await _store.SearchAsync(
            collection,
            queryEmbedding,
            topK: agent.MemoryConfig.SemanticTopK,
            minScore: agent.MemoryConfig.SemanticMinScore,
            filter: new MemoryFilter { TenantId = agent.TenantId },
            ct: ct);

        // 3. Format for context injection
        return new SemanticRetrievalResult
        {
            Chunks = results,
            FormattedContext = FormatForContext(results)
        };
    }

    private string FormatForContext(IReadOnlyList<MemoryRecord> records)
    {
        var sb = new StringBuilder("[Retrieved Knowledge]\n");
        foreach (var r in records)
        {
            sb.AppendLine($"Source: {r.Metadata.Source}");
            sb.AppendLine(r.Content);
            sb.AppendLine("---");
        }
        return sb.ToString();
    }
}

Retrieval Configuration

Config PropertyDefaultEffect
SemanticTopK5Number of chunks to retrieve per query
SemanticMinScore0.7Minimum cosine similarity threshold (0–1)
SemanticContextMaxTokens2000Max tokens allocated to retrieved knowledge in context
SemanticEnabledtrueDisable to skip retrieval entirely for this agent

Context Injection Position

Retrieved knowledge is injected between the system prompt and the conversation history in the LLM context:

// LLM context assembly order:
[1] System Prompt        ← agent.SystemPrompt
[2] Retrieved Knowledge  ← semantic retrieval results (this page)
[3] Episode Snippets     ← episodic memory recall
[4] Message History      ← current session messages (pruned to budget)
[5] Current User Message ← the user's latest input

// The LLM reads [2] Retrieved Knowledge as authoritative information
// to ground its response — reducing hallucination
Tuning MinScore

A MinScore of 0.7 is conservative — you may get fewer but more relevant chunks. If your agent frequently responds "I don't have information about that" for queries that should match, lower MinScore to 0.6. If it returns irrelevant information, raise to 0.75 or above.