BM25

BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.

BM25Retriever retriever uses the rank_bm25 package.

%pip install --upgrade --quiet  rank_bm25

from langchain_community.retrievers import BM25Retriever

API Reference:BM25Retriever

Create New Retriever with Texts

retriever = BM25Retriever.from_texts(["foo", "bar", "world", "hello", "foo bar"])

Create a New Retriever with Documents

You can now create a new retriever with the documents you created.

from langchain_core.documents import Document

retriever = BM25Retriever.from_documents(
    [
        Document(page_content="foo"),
        Document(page_content="bar"),
        Document(page_content="world"),
        Document(page_content="hello"),
        Document(page_content="foo bar"),
    ]
)

API Reference:Document

Use Retriever

We can now use the retriever!

result = retriever.invoke("foo")

result

[Document(metadata={}, page_content='foo'),
 Document(metadata={}, page_content='foo bar'),
 Document(metadata={}, page_content='hello'),
 Document(metadata={}, page_content='world')]

Preprocessing Function

Pass a custom preprocessing function to the retriever to improve search results. Tokenizing text at the word level can enhance retrieval, especially when using vector stores like Chroma, Pinecone, or Faiss for chunked documents.

import nltk

nltk.download("punkt_tab")

from nltk.tokenize import word_tokenize

retriever = BM25Retriever.from_documents(
    [
        Document(page_content="foo"),
        Document(page_content="bar"),
        Document(page_content="world"),
        Document(page_content="hello"),
        Document(page_content="foo bar"),
    ],
    k=2,
    preprocess_func=word_tokenize,
)

result = retriever.invoke("bar")
result

[Document(metadata={}, page_content='bar'),
 Document(metadata={}, page_content='foo bar')]

Retriever conceptual guide
Retriever how-to guides

Create New Retriever with Texts​

Create a New Retriever with Documents​

Use Retriever​

Preprocessing Function​

Related​

Was this page helpful?

Create New Retriever with Texts

Create a New Retriever with Documents

Use Retriever

Preprocessing Function

Related