Embedding — RAG | BizFirstAI

IEmbeddingProvider Interface

public interface IEmbeddingProvider
{
    // Embed a single text string
    Task<float[]> EmbedAsync(string text, CancellationToken ct = default);

    // Embed a batch of strings (more efficient — fewer API calls)
    Task<IReadOnlyList<float[]>> EmbedBatchAsync(
        IReadOnlyList<string> texts,
        CancellationToken ct = default);

    // Dimensions of the output vector (e.g. 1536, 3072, 768)
    int Dimensions { get; }
}

// Usage in the indexing pipeline:
float[] chunkEmbedding = await _embedder.EmbedAsync(chunk.Content, ct);

// Usage in the retrieval pipeline:
float[] queryEmbedding = await _embedder.EmbedAsync(userMessage, ct);

Supported Embedding Models

Provider	Model	Dimensions	Max Input Tokens	Notes
OpenAI	text-embedding-3-small	1536	8191	Recommended default — fast, cost-effective
OpenAI	text-embedding-3-large	3072	8191	Higher quality; 2x cost
Azure OpenAI	text-embedding-ada-002	1536	8191	Azure-hosted; same quality as OpenAI ada
Local ONNX	all-MiniLM-L6-v2	384	512	On-premise; no API cost; lower quality
Local ONNX	bge-large-en-v1.5	1024	512	High quality local model; CPU-intensive

Configuration

// appsettings.json — Embedding configuration
{
  "Octopus": {
    "Embedding": {
      "Provider":     "OpenAI",              // OpenAI | AzureOpenAI | ONNX
      "Model":        "text-embedding-3-small",
      "Dimensions":   1536,
      "CredentialId": 42,                    // ICredentialResolver lookup for API key
      "BatchSize":    100                    // Max chunks per batch API call
    }
  }
}

// Azure OpenAI variant
{
  "Octopus": {
    "Embedding": {
      "Provider":     "AzureOpenAI",
      "Endpoint":     "https://my-instance.openai.azure.com",
      "Deployment":   "text-embedding-ada-002",
      "Dimensions":   1536,
      "CredentialId": 43
    }
  }
}

// Local ONNX variant (no API key required)
{
  "Octopus": {
    "Embedding": {
      "Provider":   "ONNX",
      "ModelPath":  "/models/all-MiniLM-L6-v2.onnx",
      "Dimensions": 384
    }
  }
}

Batch Embedding for Efficiency

// Batch embed all chunks in a document before writing to vector store
public async Task IndexDocumentAsync(DocumentContent doc, Guid agentId, CancellationToken ct)
{
    var chunks = _chunker.Chunk(doc.RawText, _chunkingConfig);

    // Embed all chunks in batches of 100
    var allEmbeddings = await _embedder.EmbedBatchAsync(
        chunks.Select(c => c.Content).ToList(), ct);

    // Write chunks + embeddings to vector store
    var records = chunks.Select((chunk, i) => new MemoryRecord
    {
        Id        = Guid.NewGuid().ToString(),
        Content   = chunk.Content,
        Embedding = allEmbeddings[i],
        Metadata  = new MemoryMetadata
        {
            Source   = doc.Metadata.Source,
            AgentId  = agentId.ToString(),
            TenantId = doc.Metadata.TenantId.ToString()
        }
    }).ToList();

    await _vectorStore.UpsertBatchAsync($"agent_{agentId:N}", records, ct);
}

Model Lock-In Warning

Once documents are indexed with a specific embedding model, you cannot switch models without re-indexing all documents. The stored vectors are incompatible with a different model's query vectors. Establish and document your embedding model choice before indexing any production data.

← Chunking Strategies Next: Indexing →