Hybrid BM25 and KNN Vector Search Achieves 22% Retrieval Uplift

The Challenge

The initial search implementation was mostly aligned towards traditional keywords-based matching and hence failed to retrieve documents having similar content. Both the recall and ranking of documents were below expectation, especially for enterprise search optimization scenarios.

Hypothesis

A hybrid approach including semantic similarity in search engines would enhance retrieval quality by capturing the underlying meaning of words, not just their literal form.

Execution

KNN Search:
Implemented a similarity-based KNN algorithm to surface relevant documents even when the query terms did not appear verbatim in the documents, leveraging vector search to represent document and query semantics.

Hybrid Search:
Through experimentation, we developed an optimized hybrid search BM25 configuration that combined BM25 with KNN to leverage the strengths of both exact matching and semantic similarity, backed by a hybrid search vector database for unified scoring and retrieval.

Outcomes

The hybrid search strategy led to a 22% improvement in retrieval performance compared to the previous BM25-only approach. In most cases, Mean reciprocal ranking and Recall score improved.

Project Highlights

22%

improvement in retrieval performance