Portal Community

Bi-Encoder vs. Cross-Encoder

Bi-Encoder (Vector Search)Cross-Encoder (Reranking)
InputQuery and document embedded separatelyQuery + document concatenated together
SpeedFast — pre-computed embeddingsSlow — O(N) model inference at query time
AccuracyGood — semantic similarityBetter — models query-document interaction
Use in pipelineFirst pass: retrieve top-50 candidatesSecond pass: rerank top-50, take top-5

Reranking Pipeline

public class RetrievalPipeline
{
    public async Task<IReadOnlyList<MemoryRecord>> RetrieveAndRerankAsync(
        string query, float[] queryEmbedding, string collection,
        int initialTopK, int finalTopK, CancellationToken ct = default)
    {
        // Step 1: Wide retrieval — get 3-5x more candidates than needed
        var candidates = await _store.SearchAsync(
            collection, queryEmbedding, topK: initialTopK, minScore: 0.6f, ct: ct);

        if (!_config.RerankerEnabled || candidates.Count <= finalTopK)
            return candidates.Take(finalTopK).ToList();

        // Step 2: Rerank — score each candidate against the query
        var pairs = candidates.Select(c => (query, c.Content));
        var scores = await _reranker.ScoreAsync(pairs, ct);

        // Step 3: Return top-K by reranker score
        return candidates
            .Zip(scores)
            .OrderByDescending(x => x.Second)
            .Where(x => x.Second >= _config.RerankerMinScore)
            .Take(finalTopK)
            .Select(x => x.First)
            .ToList();
    }
}

Supported Rerankers

RerankerTypeLatencyQuality
Cohere Rerank v3API (cloud)~200msExcellent
cross-encoder/ms-marco-MiniLM-L-6-v2Local (ONNX)~50msGood
bge-reranker-largeLocal (ONNX)~150msVery good

Reranking Configuration

public class RerankerConfig
{
    public bool RerankerEnabled { get; set; } = false;  // opt-in
    public string RerankerType { get; set; }             // "Cohere", "LocalOnnx"
    public int CredentialId { get; set; }                // API key via ICredentialResolver
    public int InitialTopK { get; set; } = 20;           // candidates for reranking
    public int FinalTopK { get; set; } = 5;              // final results after rerank
    public float RerankerMinScore { get; set; } = 0.1f;  // reranker score threshold
}
Latency Impact

Reranking adds latency proportional to the number of candidates scored. With 20 candidates and a local ONNX reranker, expect ~50ms additional latency. With Cohere Rerank API and 20 candidates, expect ~200ms. Only enable reranking for agents where retrieval quality is critical and users can tolerate a slight response delay.