Portal Community

Why Hybrid Search?

Pure vector search excels at semantic understanding but can miss exact term matches. Pure keyword search (BM25) excels at exact matches but misses synonyms and semantic intent. Hybrid search covers both failure modes:

Query TypeVector AloneBM25 AloneHybrid
"What is the parental leave policy?"ExcellentGoodExcellent
"SKU AX-2240 specifications"Poor (exact code)ExcellentExcellent
"John Smith expense report March"Poor (name + date)ExcellentExcellent
"How do I request time off?"Excellent (semantic)Mediocre (phrasing variance)Excellent

Reciprocal Rank Fusion (RRF)

RRF merges two ranked lists by assigning each document a score based on its rank position in each list. Documents ranked highly in both lists receive the highest combined score:

public class HybridSearchService
{
    private const int RRF_K = 60;  // Standard RRF constant

    public async Task<IReadOnlyList<MemoryRecord>> SearchAsync(
        string collection,
        string queryText,
        float[] queryEmbedding,
        int topK,
        float minScore,
        CancellationToken ct)
    {
        // Run both searches in parallel
        var vectorTask  = _vectorStore.SearchAsync(collection, queryEmbedding, topK * 2, 0f, ct);
        var keywordTask = _keywordStore.SearchAsync(collection, queryText, topK * 2, ct);

        await Task.WhenAll(vectorTask, keywordTask);

        var vectorResults  = vectorTask.Result;
        var keywordResults = keywordTask.Result;

        // Build RRF score maps
        var scores = new Dictionary<string, double>();

        for (int rank = 0; rank < vectorResults.Count; rank++)
        {
            string id = vectorResults[rank].Id;
            scores[id] = scores.GetValueOrDefault(id) + 1.0 / (RRF_K + rank + 1);
        }

        for (int rank = 0; rank < keywordResults.Count; rank++)
        {
            string id = keywordResults[rank].Id;
            scores[id] = scores.GetValueOrDefault(id) + 1.0 / (RRF_K + rank + 1);
        }

        // Merge all records and sort by RRF score
        var allRecords = vectorResults.Concat(keywordResults)
            .GroupBy(r => r.Id)
            .Select(g => g.First())  // Deduplicate
            .ToList();

        return allRecords
            .OrderByDescending(r => scores[r.Id])
            .Take(topK)
            .ToList();
    }
}

Keyword Search Backend Options

BackendNotesWhen to Use
SQL Server Full-Text SearchUses existing SQL Server; FTS index on chunk contentSQL Server already in use; small-medium collections
Qdrant Sparse Vectors (SPLADE)Sparse vectors stored alongside dense; native Qdrant hybridQdrant is the vector backend; larger collections
Elasticsearch / OpenSearchDedicated BM25 index; best precisionTeams already running Elasticsearch

Enabling Hybrid Search

// appsettings.json
{
  "Octopus": {
    "HybridSearch": {
      "Enabled":         true,
      "KeywordBackend":  "SqlServerFTS",  // SqlServerFTS | QdrantSparse | Elasticsearch
      "RrfK":            60
    }
  }
}

// Agent memory config
{
  "hybridSearchEnabled": true
}
// When enabled, SemanticMemoryService automatically routes
// through HybridSearchService instead of direct vector search.
Start with Vector-Only

Hybrid search adds infrastructure complexity (keyword index to maintain) and additional latency (two parallel searches). Enable it only when you observe that pure vector search misses exact-match queries important to your use case. Many knowledge base agents perform well with vector-only retrieval.