Skip to main content

Retrievers

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Retrievers can be created from vector stores, but are also broad enough to include Wikipedia search and Amazon Kendra.

Retrievers accept a string query as input and return a list of Documents as output.

For specifics on how to use retrievers, see the relevant how-to guides here.

Note that all vector stores can be cast to retrievers. Refer to the vector store integration docs for available vector stores. This page lists custom retrievers, implemented via subclassing BaseRetriever.

Bring-your-own documents

The below retrievers allow you to index and search a custom corpus of documents.

RetrieverSelf-hostCloud offeringPackage
AmazonKnowledgeBasesRetrieverlangchain_aws
AzureAISearchRetrieverlangchain_community
ElasticsearchRetrieverlangchain_elasticsearch
MilvusCollectionHybridSearchRetrieverlangchain_milvus
VertexAISearchRetrieverlangchain_google_community

External index

The below retrievers will search over an external index (e.g., constructed from Internet data or similar).

RetrieverSourcePackage
ArxivRetrieverScholarly articles on arxiv.orglangchain_community
TavilySearchAPIRetrieverInternet searchlangchain_community
WikipediaRetrieverWikipedia articleslangchain_community

All retrievers

NameDescription
Activeloop Deep MemoryActiveloop Deep Memory is a suite of tools that enables you to optimi...
Amazon KendraAmazon Kendra is an intelligent search service provided by Amazon Web...
ArceeArcee helps with the development of the SLMs—small, specialized, secu...
ArxivarXiv is an open-access archive for 2 million scholarly articles in t...
AskNewsAskNews infuses any LLM with the latest global news (or historical ne...
Azure AI SearchAzure AI Search (formerly known as Azure Cognitive Search) is a Micro...
Bedrock (Knowledge Bases)This guide will help you getting started with the AWS Knowledge Bases...
BM25BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function ...
BoxThis will help you getting started with the Box retriever. For detail...
BREEBS (Open Knowledge)BREEBS is an open collaborative knowledge platform.
ChaindeskChaindesk platform brings data from anywhere (Datsources: Text, PDF, ...
ChatGPT pluginOpenAI plugins connect ChatGPT to third-party applications. These plu...
Cohere rerankerCohere is a Canadian startup that provides natural language processin...
Cohere RAGCohere is a Canadian startup that provides natural language processin...
DocArrayDocArray is a versatile, open-source tool for managing your multi-mod...
DriaDria is a hub of public RAG models for developers to both contribute ...
ElasticSearch BM25Elasticsearch is a distributed, RESTful search and analytics engine. ...
ElasticsearchElasticsearch is a distributed, RESTful search and analytics engine. ...
EmbedchainEmbedchain is a RAG framework to create data pipelines. It loads, ind...
FlashRank rerankerFlashRank is the Ultra-lite & Super-fast Python library to add re-ran...
Fleet AI ContextFleet AI Context is a dataset of high-quality embeddings of the top 1...
Google DriveThis notebook covers how to retrieve documents from Google Drive.
Google Vertex AI SearchGoogle Vertex AI Search (formerly known as Enterprise Search on Gener...
IBM watsonx.aiWatsonxRerank is a wrapper for IBM watsonx.ai foundation models.
JaguarDB Vector Database[JaguarDB Vector Database](http://www.jaguardb.com/windex.html
Kay.aiKai Data API built for RAG 🕵️ We are curating the world's largest da...
Kinetica Vectorstore based RetrieverKinetica is a database with integrated support for vector similarity ...
kNNIn statistics, the k-nearest neighbours algorithm (k-NN) is a non-par...
LinkupSearchRetrieverLinkup provides an API to connect LLMs to the web and the Linkup Prem...
LLMLingua Document CompressorLLMLingua utilizes a compact, well-trained language model (e.g., GPT2...
LOTR (Merger Retriever)Lord of the Retrievers (LOTR), also known as MergerRetriever, takes a...
MetalMetal is a managed service for ML Embeddings.
Milvus Hybrid SearchMilvus is an open-source vector database built to power embedding sim...
NanoPQ (Product Quantization)Product Quantization algorithm (k-NN) in brief is a quantization algo...
needleNeedle Retriever
OutlineOutline is an open-source collaborative knowledge base platform desig...
Pinecone Hybrid SearchPinecone is a vector database with broad functionality.
PubMedPubMed® by The National Center for Biotechnology Information, Nationa...
Qdrant Sparse VectorQdrant is an open-source, high-performance vector search engine/datab...
RAGatouilleRAGatouille makes it as simple as can be to use ColBERT!
RePhraseQueryRePhraseQuery is a simple retriever that applies an LLM between the u...
RememberizerRememberizer is a knowledge enhancement service for AI applications c...
SEC filingSEC filing is a financial statement or other formal document submitte...
Self-querying retrievers
SingleStoreDBSingleStoreDB is a high-performance distributed SQL database that sup...
SVMSupport vector machines (SVMs) are a set of supervised learning metho...
TavilySearchAPITavily's Search API is a search engine built specifically for AI agen...
TF-IDFTF-IDF means term-frequency times inverse document-frequency.
**NeuralDB**NeuralDB is a CPU-friendly and fine-tunable retrieval engine develope...
VespaVespa is a fully featured search engine and vector database. It suppo...
WikipediaOverview
You.comyou.com API is a suite of tools designed to help developers ground th...
Zep CloudRetriever Example for Zep Cloud
Zep Open SourceRetriever Example for Zep
Zilliz Cloud PipelineZilliz Cloud Pipelines transform your unstructured data to a searchab...

Was this page helpful?