Hybrid search is an often misused and/or misunderstood term. In this section, we’re using the definition of “hybrid search” to mean using a combination of keyword-based and vector search. Because the vector search operates in a dense embedding space and keyword-based search operate in a sparse embedding space, their relevance scores cannot be directly compared. Combining results from multiple searches thus requires a reranking step.Documentation Index
Fetch the complete documentation index at: https://lancedb-bcbb4faf-docs-namespace-typescript-examples.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Reranking strategies
There are two common approaches for reranking search results from multiple sources.- Score-based: Calculate final relevance scores from the individual search algorithm scores. Examples: Reciprocal Rank Fusion (the default in LanceDB), and weighted linear combination of semantic & keyword-based search scores.
- Relevance-based: Discards the existing scores and calculates the relevance of each search result-query pair. Example: Cross Encoder models
If you call
.rerank() on a hybrid query without passing a reranker, LanceDB defaults to
RRFReranker() — a score-based reranker that uses Reciprocal Rank Fusion. This is the
score-based path most readers encounter first; LinearCombinationReranker is an alternative
score-based strategy you opt into explicitly.rerank(...) method also accepts a normalize argument that controls how the raw
vector and FTS scores are made comparable before reranking:
normalize="score"(the default) — normalizes the raw vector and FTS scores directly.normalize="rank"— converts each result list to ranks first, then normalizes.
LinearCombinationReranker), so
when you evaluate score-based strategies, treat normalize as a tunable hyperparameter
alongside the reranker itself.
Even though there may many more strategies for reranking, there are no “universally best”
ones that work well for all cases, because they be dataset or application specific.
Evaluating whether a reranking strategy is a good one, is also a challenge. In the next
section, we discuss an example evaluation of different reranking strategies on a sample dataset.
Example evaluation
The table below shows our evaluation results from an experiment comparing multiple rerankers on ~800 hybrid search queries. This is a modified version of an evaluation script by LlamaIndex that measures hit-rate @ top-k.Using OpenAI text-embedding-ada-002
Vector Search baseline: 0.64
| Reranker | Top-3 | Top-5 | Top-10 |
|---|---|---|---|
| Linear Combination | 0.73 | 0.74 | 0.85 |
| Cross Encoder | 0.71 | 0.70 | 0.77 |
| Cohere | 0.81 | 0.81 | 0.85 |
| ColBERT | 0.68 | 0.68 | 0.73 |
Using OpenAI text-embedding-3-small
Vector Search baseline: 0.59
| Reranker | Top-3 | Top-5 | Top-10 |
|---|---|---|---|
| Linear Combination | 0.68 | 0.70 | 0.84 |
| Cross Encoder | 0.72 | 0.72 | 0.79 |
| Cohere | 0.79 | 0.79 | 0.84 |
| ColBERT | 0.70 | 0.70 | 0.76 |