Octopus
Retrieval at Query Time
At query time, the user's message is embedded and matched against the agent's knowledge collection. The top-K most similar chunks are returned, filtered, and injected into the LLM context window before generation.
The Retrieval Pipeline
public class SemanticMemoryService : ISemanticMemoryService
{
public async Task<SemanticRetrievalResult> RetrieveAsync(
AgentComposite agent,
string query,
CancellationToken ct = default)
{
if (!agent.MemoryConfig.SemanticEnabled)
return SemanticRetrievalResult.Empty;
// 1. Embed the query
var queryEmbedding = await _embeddingProvider.EmbedAsync(query, ct);
// 2. Search the agent's vector collection
var collection = GetCollectionName(agent);
var results = await _store.SearchAsync(
collection,
queryEmbedding,
topK: agent.MemoryConfig.SemanticTopK,
minScore: agent.MemoryConfig.SemanticMinScore,
filter: new MemoryFilter { TenantId = agent.TenantId },
ct: ct);
// 3. Format for context injection
return new SemanticRetrievalResult
{
Chunks = results,
FormattedContext = FormatForContext(results)
};
}
private string FormatForContext(IReadOnlyList<MemoryRecord> records)
{
var sb = new StringBuilder("[Retrieved Knowledge]\n");
foreach (var r in records)
{
sb.AppendLine($"Source: {r.Metadata.Source}");
sb.AppendLine(r.Content);
sb.AppendLine("---");
}
return sb.ToString();
}
}
Retrieval Configuration
| Config Property | Default | Effect |
|---|---|---|
SemanticTopK | 5 | Number of chunks to retrieve per query |
SemanticMinScore | 0.7 | Minimum cosine similarity threshold (0–1) |
SemanticContextMaxTokens | 2000 | Max tokens allocated to retrieved knowledge in context |
SemanticEnabled | true | Disable to skip retrieval entirely for this agent |
Context Injection Position
Retrieved knowledge is injected between the system prompt and the conversation history in the LLM context:
// LLM context assembly order:
[1] System Prompt ← agent.SystemPrompt
[2] Retrieved Knowledge ← semantic retrieval results (this page)
[3] Episode Snippets ← episodic memory recall
[4] Message History ← current session messages (pruned to budget)
[5] Current User Message ← the user's latest input
// The LLM reads [2] Retrieved Knowledge as authoritative information
// to ground its response — reducing hallucination
Tuning MinScore
A MinScore of 0.7 is conservative — you may get fewer but more relevant chunks. If your agent frequently responds "I don't have information about that" for queries that should match, lower MinScore to 0.6. If it returns irrelevant information, raise to 0.75 or above.