Octopus
Reranking Results
Reranking is an optional post-retrieval step that uses a more accurate cross-encoder model to score retrieved chunks against the query, producing a better-ordered top-K before context injection. It improves precision at a latency cost.
Bi-Encoder vs. Cross-Encoder
| Bi-Encoder (Vector Search) | Cross-Encoder (Reranking) | |
|---|---|---|
| Input | Query and document embedded separately | Query + document concatenated together |
| Speed | Fast — pre-computed embeddings | Slow — O(N) model inference at query time |
| Accuracy | Good — semantic similarity | Better — models query-document interaction |
| Use in pipeline | First pass: retrieve top-50 candidates | Second pass: rerank top-50, take top-5 |
Reranking Pipeline
public class RetrievalPipeline
{
public async Task<IReadOnlyList<MemoryRecord>> RetrieveAndRerankAsync(
string query, float[] queryEmbedding, string collection,
int initialTopK, int finalTopK, CancellationToken ct = default)
{
// Step 1: Wide retrieval — get 3-5x more candidates than needed
var candidates = await _store.SearchAsync(
collection, queryEmbedding, topK: initialTopK, minScore: 0.6f, ct: ct);
if (!_config.RerankerEnabled || candidates.Count <= finalTopK)
return candidates.Take(finalTopK).ToList();
// Step 2: Rerank — score each candidate against the query
var pairs = candidates.Select(c => (query, c.Content));
var scores = await _reranker.ScoreAsync(pairs, ct);
// Step 3: Return top-K by reranker score
return candidates
.Zip(scores)
.OrderByDescending(x => x.Second)
.Where(x => x.Second >= _config.RerankerMinScore)
.Take(finalTopK)
.Select(x => x.First)
.ToList();
}
}
Supported Rerankers
| Reranker | Type | Latency | Quality |
|---|---|---|---|
| Cohere Rerank v3 | API (cloud) | ~200ms | Excellent |
| cross-encoder/ms-marco-MiniLM-L-6-v2 | Local (ONNX) | ~50ms | Good |
| bge-reranker-large | Local (ONNX) | ~150ms | Very good |
Reranking Configuration
public class RerankerConfig
{
public bool RerankerEnabled { get; set; } = false; // opt-in
public string RerankerType { get; set; } // "Cohere", "LocalOnnx"
public int CredentialId { get; set; } // API key via ICredentialResolver
public int InitialTopK { get; set; } = 20; // candidates for reranking
public int FinalTopK { get; set; } = 5; // final results after rerank
public float RerankerMinScore { get; set; } = 0.1f; // reranker score threshold
}
Latency Impact
Reranking adds latency proportional to the number of candidates scored. With 20 candidates and a local ONNX reranker, expect ~50ms additional latency. With Cohere Rerank API and 20 candidates, expect ~200ms. Only enable reranking for agents where retrieval quality is critical and users can tolerate a slight response delay.