Semantic Memory
Semantic memory is the agent's knowledge base — documents, policies, FAQs, manuals, and any other text that the agent should be able to answer questions about. It lives in a vector database and is retrieved by cosine similarity on the user's current message.
What Is Semantic Memory?
Semantic memory is not conversation history — it is static (or slowly changing) knowledge. You load it by indexing documents into a vector store. At query time, the user's message is embedded and the most semantically similar document chunks are retrieved:
// Simplified RAG pipeline
string userMessage = "Can I carry over unused leave days?";
// 1. Embed the question
float[] queryEmbedding = await _embedder.EmbedAsync(userMessage);
// 2. Search the vector store for similar chunks
IReadOnlyList<MemoryRecord> results = await _vectorStore.SearchAsync(
collection: $"agent_{agentId}",
embedding: queryEmbedding,
topK: 5,
minScore: 0.75f);
// 3. Inject the top chunks into working memory before the LLM call
string injected = FormatForContext(results);
// → "[Retrieved Knowledge]\nSource: HR Policy 2025.pdf\nCarry-over cap is 5 days..."
Key Characteristics
| Property | Value | Notes |
|---|---|---|
| Storage | Vector database (Qdrant or PGVector) | Each agent has its own collection |
| Lifetime | Persistent until deleted | Survives server restarts and deployments |
| Retrieval | Cosine similarity on query embedding | Supports hybrid search (vector + keyword) and reranking |
| Indexing | Document ingestion pipeline | Chunk → Embed → Store (via admin UI or API) |
| Per-agent isolation | Collection per agent | Agent A cannot read Agent B's knowledge base |
Indexing Pipeline
IDocumentIngester extracts raw text.
IEmbeddingProvider (e.g. OpenAI text-embedding-3-small).
Supported Vector Backends
| Backend | Best For | Filtering |
|---|---|---|
| Qdrant | Production deployments, large knowledge bases | Payload filters (metadata) |
| PGVector | Teams already on PostgreSQL | SQL WHERE clauses |
Advanced Retrieval Options
Beyond simple vector search, two advanced retrieval modes improve answer quality:
- Hybrid search: Combines vector similarity with BM25 keyword search using Reciprocal Rank Fusion (RRF). Better for precise term matching (part numbers, names, codes).
- Reranking: A cross-encoder model rescores the retrieved chunks and reorders them by relevance. Supported providers: Cohere Rerank, ONNX local models.
This is a summary page. The Semantic Memory full guide covers embedding models, vector store setup (Qdrant/PGVector), the indexing pipeline, retrieval configuration, hybrid search with RRF, reranking, and per-agent collection isolation.