Semantic Memory — Memory Overview

What Is Semantic Memory?

Semantic memory is not conversation history — it is static (or slowly changing) knowledge. You load it by indexing documents into a vector store. At query time, the user's message is embedded and the most semantically similar document chunks are retrieved:

// Simplified RAG pipeline
string userMessage = "Can I carry over unused leave days?";

// 1. Embed the question
float[] queryEmbedding = await _embedder.EmbedAsync(userMessage);

// 2. Search the vector store for similar chunks
IReadOnlyList<MemoryRecord> results = await _vectorStore.SearchAsync(
    collection: $"agent_{agentId}",
    embedding:  queryEmbedding,
    topK:       5,
    minScore:   0.75f);

// 3. Inject the top chunks into working memory before the LLM call
string injected = FormatForContext(results);
// → "[Retrieved Knowledge]\nSource: HR Policy 2025.pdf\nCarry-over cap is 5 days..."

Key Characteristics

Property	Value	Notes
Storage	Vector database (Qdrant or PGVector)	Each agent has its own collection
Lifetime	Persistent until deleted	Survives server restarts and deployments
Retrieval	Cosine similarity on query embedding	Supports hybrid search (vector + keyword) and reranking
Indexing	Document ingestion pipeline	Chunk → Embed → Store (via admin UI or API)
Per-agent isolation	Collection per agent	Agent A cannot read Agent B's knowledge base

Indexing Pipeline

Ingest Document Upload PDF, DOCX, TXT, or Markdown. The IDocumentIngester extracts raw text.

Chunk Split text into overlapping chunks (default 512 tokens, 64-token overlap) for granular retrieval.

Embed Each chunk is embedded via IEmbeddingProvider (e.g. OpenAI text-embedding-3-small).

Store Chunk text + embedding + metadata written to the agent's vector collection.

Available Immediately Next turn retrieval will find the new chunks — no restart needed.

Supported Vector Backends

Backend	Best For	Filtering
Qdrant	Production deployments, large knowledge bases	Payload filters (metadata)
PGVector	Teams already on PostgreSQL	SQL WHERE clauses

Advanced Retrieval Options

Beyond simple vector search, two advanced retrieval modes improve answer quality:

Hybrid search: Combines vector similarity with BM25 keyword search using Reciprocal Rank Fusion (RRF). Better for precise term matching (part numbers, names, codes).
Reranking: A cross-encoder model rescores the retrieved chunks and reorders them by relevance. Supported providers: Cohere Rerank, ONNX local models.

Full Guide

This is a summary page. The Semantic Memory full guide covers embedding models, vector store setup (Qdrant/PGVector), the indexing pipeline, retrieval configuration, hybrid search with RRF, reranking, and per-agent collection isolation.

← Episodic Memory Next: Procedural Memory →