Evaluating Hybrid Search Performance

Hybrid search is an often misused and/or misunderstood term. In this section, we’re using the definition of “hybrid search” to mean using a combination of keyword-based and vector search. Because the vector search operates in a dense embedding space and keyword-based search operate in a sparse embedding space, their relevance scores cannot be directly compared. Combining results from multiple searches thus requires a reranking step.

Reranking strategies

There are two common approaches for reranking search results from multiple sources.

Score-based: Calculate final relevance scores from the individual search algorithm scores. Examples: Reciprocal Rank Fusion (the default in LanceDB), and weighted linear combination of semantic & keyword-based search scores.
Relevance-based: Discards the existing scores and calculates the relevance of each search result-query pair. Example: Cross Encoder models

If you call .rerank() on a hybrid query without passing a reranker, LanceDB defaults to RRFReranker() — a score-based reranker that uses Reciprocal Rank Fusion. This is the score-based path most readers encounter first; LinearCombinationReranker is an alternative score-based strategy you opt into explicitly.

The hybrid rerank(...) method also accepts a normalize argument that controls how the raw vector and FTS scores are made comparable before reranking:

normalize="score" (the default) — normalizes the raw vector and FTS scores directly.
normalize="rank" — converts each result list to ranks first, then normalizes.

This choice materially affects score-based rerankers (such as LinearCombinationReranker), so when you evaluate score-based strategies, treat normalize as a tunable hyperparameter alongside the reranker itself. Even though there may many more strategies for reranking, there are no “universally best” ones that work well for all cases, because they be dataset or application specific. Evaluating whether a reranking strategy is a good one, is also a challenge. In the next section, we discuss an example evaluation of different reranking strategies on a sample dataset.

Example evaluation

The table below shows our evaluation results from an experiment comparing multiple rerankers on ~800 hybrid search queries. This is a modified version of an evaluation script by LlamaIndex that measures hit-rate @ top-k.

Using OpenAI `text-embedding-ada-002`

Vector Search baseline: 0.64

Reranker	Top-3	Top-5	Top-10
Linear Combination	`0.73`	`0.74`	`0.85`
Cross Encoder	`0.71`	`0.70`	`0.77`
Cohere	`0.81`	`0.81`	`0.85`
ColBERT	`0.68`	`0.68`	`0.73`

Using OpenAI `text-embedding-3-small`

Vector Search baseline: 0.59

Reranker	Top-3	Top-5	Top-10
Linear Combination	`0.68`	`0.70`	`0.84`
Cross Encoder	`0.72`	`0.72`	`0.79`
Cohere	`0.79`	`0.79`	`0.84`
ColBERT	`0.70`	`0.70`	`0.76`

Conclusion

The results show that the reranking methods can significantly improve the search relevance. However, the improvement we saw was not consistent across all rerankers. In reality, the choice of reranker likely depends on the dataset and the application. It’s also important to note that the reranking methods are not a replacement for the search methods they supplement. They are complementary and it’s likely that you’d have to tune them together to get the best results. The latency vs. recall tradeoff is also an important factor to consider when choosing the reranker. Hopefully this evaluation gives you a starting point for your own experiments with hybrid search in LanceDB!

Get started

Guides

Feature Engineering (Geneva)

Support

Evaluating Hybrid Search Performance

Reranking strategies

Example evaluation

Using OpenAI `text-embedding-ada-002`

Using OpenAI `text-embedding-3-small`

Conclusion

Get started

Guides

Feature Engineering (Geneva)

Support

Documentation Index

​Reranking strategies

​Example evaluation

​Using OpenAI text-embedding-ada-002

​Using OpenAI text-embedding-3-small

​Conclusion

Reranking strategies

Example evaluation

Using OpenAI `text-embedding-ada-002`

Using OpenAI `text-embedding-3-small`

Conclusion