Octopus
Hybrid Search
Hybrid search combines vector similarity search (semantic) with keyword/BM25 search (lexical) to improve retrieval precision. It is especially effective for queries that include specific terms — product names, IDs, codes — where pure vector search may miss exact matches.
Why Hybrid Search?
Pure vector search is excellent at semantic similarity but can miss exact term matches. Consider:
- Query: "What is policy HR-2024-LEV-003?" — the policy ID is a unique string; vector search may not find the right document if the ID wasn't prominently embedded
- Query: "How does VLOOKUP work?" — keyword "VLOOKUP" is specific; vector search on "how to look up data" may retrieve unrelated spreadsheet docs
Hybrid search solves this by running both vector search and keyword search in parallel, then merging results using Reciprocal Rank Fusion (RRF).
Hybrid Search Implementation
public class HybridSearchService
{
public async Task<IReadOnlyList<MemoryRecord>> SearchAsync(
string collection,
string queryText,
float[] queryEmbedding,
int topK,
CancellationToken ct = default)
{
// Run both searches in parallel
var vectorTask = _vectorStore.SearchAsync(
collection, queryEmbedding, topK * 2, minScore: 0.5f, ct: ct);
var keywordTask = _keywordSearch.SearchAsync(
collection, queryText, topK * 2, ct: ct);
await Task.WhenAll(vectorTask, keywordTask);
// Merge using Reciprocal Rank Fusion
return ReciprocallRankFusion(
vectorTask.Result,
keywordTask.Result,
topK: topK,
vectorWeight: 0.7f, // configurable
keywordWeight: 0.3f);
}
private IReadOnlyList<MemoryRecord> ReciprocallRankFusion(
IReadOnlyList<MemoryRecord> vectorResults,
IReadOnlyList<MemoryRecord> keywordResults,
int topK, float vectorWeight, float keywordWeight)
{
var scores = new Dictionary<string, float>();
const int k = 60; // RRF constant
for (int i = 0; i < vectorResults.Count; i++)
scores[vectorResults[i].Id] =
scores.GetValueOrDefault(vectorResults[i].Id) +
vectorWeight / (k + i + 1);
for (int i = 0; i < keywordResults.Count; i++)
scores[keywordResults[i].Id] =
scores.GetValueOrDefault(keywordResults[i].Id) +
keywordWeight / (k + i + 1);
return scores
.OrderByDescending(s => s.Value)
.Take(topK)
.Select(s => vectorResults.Concat(keywordResults)
.First(r => r.Id == s.Key))
.ToList();
}
}
Keyword Search Backend
For PGVector, the keyword search uses PostgreSQL full-text search (tsvector). For Qdrant, it uses Qdrant's sparse vector support (SPLADE model) for dense-sparse hybrid search. The configuration selects which approach to use:
// Hybrid search config
public class HybridSearchConfig
{
public bool Enabled { get; set; } = false; // off by default
public float VectorWeight { get; set; } = 0.7f;
public float KeywordWeight { get; set; } = 0.3f;
public HybridSearchBackend Backend { get; set; } // PgFullText, QdrantSparse
}