Embedding Knowledge — Semantic Memory

IEmbeddingProvider Interface

public interface IEmbeddingProvider
{
    string ModelName { get; }
    int VectorDimensions { get; }   // must match collection dimension

    Task<float[]> EmbedAsync(
        string text,
        CancellationToken ct = default);

    Task<IReadOnlyList<float[]>> EmbedBatchAsync(
        IReadOnlyList<string> texts,
        int batchSize = 100,
        CancellationToken ct = default);
}

Supported Embedding Models

Model	Provider	Dimensions	Notes
text-embedding-3-small	OpenAI	1536	Default — good balance of quality and cost
text-embedding-3-large	OpenAI	3072	Higher quality, 2x cost
text-embedding-ada-002	OpenAI	1536	Legacy — use 3-small instead
all-MiniLM-L6-v2	sentence-transformers (local)	384	Local, fast, lower quality
nomic-embed-text	Nomic / Ollama	768	Local, high quality

Embedding Configuration

// Embedding config on agent's MemoryConfig
public class EmbeddingConfig
{
    public string Provider { get; set; }         // "OpenAI", "Local", "AzureOpenAI"
    public string Model { get; set; }            // model name
    public int CredentialId { get; set; }        // API key via ICredentialResolver
    public int BatchSize { get; set; } = 100;    // texts per API call
    public string? Endpoint { get; set; }        // for local or Azure endpoint
}

// Configuration example
new EmbeddingConfig
{
    Provider = "OpenAI",
    Model = "text-embedding-3-small",
    CredentialId = 43,    // OpenAI key in credential vault
    BatchSize = 50
}

Model Consistency is Critical

The embedding model used at indexing time must be exactly the same model used at retrieval time. Different models produce incompatible vector spaces — mixing them gives meaningless similarity scores. If you change the embedding model, you must re-index all documents from scratch.

Batch Embedding for Large Documents

// Efficient batch embedding during ingestion
public async Task EmbedDocumentChunksAsync(
    IReadOnlyList<string> chunks,
    IEmbeddingProvider provider)
{
    // Process in batches to respect rate limits
    var embeddings = await provider.EmbedBatchAsync(chunks, batchSize: 50);

    for (int i = 0; i < chunks.Count; i++)
    {
        await _store.StoreAsync(new MemoryRecord
        {
            Content = chunks[i],
            Embedding = embeddings[i],
            Metadata = new MemoryMetadata { ChunkIndex = i, ... }
        });
    }
}

← Semantic Memory Overview Next: Vector Store Backends →