Text similarity re-ranker retriever
Serverless Stack
The text_similarity_reranker retriever uses an NLP model to improve search results by reordering the top-k documents based on their semantic similarity to the query.
Refer to Semantic re-ranking for a high level overview of semantic re-ranking.
To use text_similarity_reranker, you can rely on the preconfigured .rerank-v1-elasticsearch inference endpoint, which uses the Elastic Rerank model and serves as the default if no inference_id is provided. This model is optimized for reranking based on text similarity. If you'd like to use a different model, you can set up a custom inference endpoint for the rerank task using the Create inference API. The endpoint should be configured with a machine learning model capable of computing text similarity. Refer to the Elastic NLP model reference for a list of third-party text similarity models supported by Elasticsearch.
You have the following options:
Use the built-in Elastic Rerank cross-encoder model via the inference API’s Elasticsearch service. See this example for creating an endpoint using the Elastic Rerank model.
Use the Cohere Rerank inference endpoint with the
reranktask type.Use the Google Vertex AI inference endpoint with the
reranktask type.Upload a model to Elasticsearch with Eland using the
text_similarityNLP task type.- Then set up an Elasticsearch service inference endpoint with the
reranktask type. - Refer to the example on this page for a step-by-step guide.
- Then set up an Elasticsearch service inference endpoint with the
Scores from the re-ranking process are normalized using the following formula before returned to the user, to avoid having negative scores.
score = max(score, 0) + min(exp(score), 1)
Using the above, any initially negative scores are projected to (0, 1) and positive scores to [1, infinity). To revert back if needed, one can use:
score = score - 1, if score >= 0
score = ln(score), if score < 0
retriever-
(Required,
retriever)The child retriever that generates the initial set of top documents to be re-ranked.
field-
(Required,
string)The document field to be used for text similarity comparisons. This field should contain the text that will be evaluated against the
inferenceText. inference_id-
(Optional,
string)Unique identifier of the inference endpoint created using the inference API. If you don’t specify an inference endpoint, the
inference_idfield defaults to.rerank-v1-elasticsearch, a preconfigured endpoint for the elasticsearch.rerank-v1model. inference_text-
(Required,
string)The text snippet used as the basis for similarity comparison.
rank_window_size-
(Optional,
int)The number of top documents to consider in the re-ranking process. Defaults to
10. min_score-
(Optional,
float)Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used.
filter-
(Optional, query object or list of query objects)
Applies the specified boolean query filter to the child
retriever. If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever. chunk_rescorerStack-
(Optional,
object)Chunks and scores documents based on configured chunking settings, and only sends the best scoring chunks to the reranking model as input. This helps improve relevance when reranking long documents that would otherwise be truncated by the reranking model's token limit.
Parameters for
chunk_rescorer:size- (Optional,
int)
The number of chunks to pass to the reranker. Defaults to
1.chunking_settings- (Optional,
object)
Settings for chunking text into smaller passages for scoring and reranking. Defaults to the optimal chunking settings for Elastic Rerank. Refer to the Inference API documentation for valid values for
chunking_settings.WarningIf you configure chunks larger than the reranker's token limit, the results may be truncated. This can degrade relevance significantly.
Refer to this Python notebook for an end-to-end example using Elastic Rerank.
This example demonstrates how to deploy the Elastic Rerank model and use it to re-rank search results using the text_similarity_reranker retriever.
Follow these steps:
Create an inference endpoint for the
reranktask using the Create inference API.PUT _inference/rerank/my-elastic-rerank{ "service": "elasticsearch", "service_settings": { "model_id": ".rerank-v1", "num_threads": 1, "adaptive_allocations": { "enabled": true, "min_number_of_allocations": 1, "max_number_of_allocations": 10 } } }- Adaptive allocations will be enabled with the minimum of 1 and the maximum of 10 allocations.
Define a
text_similarity_rerankretriever:POST _search{ "retriever": { "text_similarity_reranker": { "retriever": { "standard": { "query": { "match": { "text": "How often does the moon hide the sun?" } } } }, "field": "text", "inference_id": "my-elastic-rerank", "inference_text": "How often does the moon hide the sun?", "rank_window_size": 100, "min_score": 0.5 } } }
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminates the need to generate and store embeddings for all indexed documents. This requires a Cohere Rerank inference endpoint that is set up for the rerank task type.
GET /index/_search
{
"retriever": {
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"match_phrase": {
"text": "landmark in Paris"
}
}
}
},
"field": "text",
"inference_id": "my-cohere-rerank-model",
"inference_text": "Most famous landmark in Paris",
"rank_window_size": 100,
"min_score": 0.5
}
}
}
The following example uses the cross-encoder/ms-marco-MiniLM-L-6-v2 model from Hugging Face to rerank search results based on semantic similarity. The model must be uploaded to Elasticsearch using Eland.
Refer to the Elastic NLP model reference for a list of third party text similarity models supported by Elasticsearch.
Follow these steps to load the model and create a semantic re-ranker.
Install Eland using
pippython -m pip install eland[pytorch]Upload the model to Elasticsearch using Eland. This example assumes you have an Elastic Cloud deployment and an API key. Refer to the Eland documentation for more authentication options.
eland_import_hub_model \ --cloud-id $CLOUD_ID \ --es-api-key $ES_API_KEY \ --hub-model-id cross-encoder/ms-marco-MiniLM-L-6-v2 \ --task-type text_similarity \ --clear-previous \ --startCreate an inference endpoint for the
reranktaskPUT _inference/rerank/my-msmarco-minilm-model{ "service": "elasticsearch", "service_settings": { "num_allocations": 1, "num_threads": 1, "model_id": "cross-encoder__ms-marco-minilm-l-6-v2" } }Define a
text_similarity_rerankretriever.POST movies/_search{ "retriever": { "text_similarity_reranker": { "retriever": { "standard": { "query": { "match": { "genre": "drama" } } } }, "field": "plot", "inference_id": "my-msmarco-minilm-model", "inference_text": "films that explore psychological depths" } } }This retriever uses a standard
matchquery to search themovieindex for films tagged with the genre "drama". It then re-ranks the results based on semantic similarity to the text in theinference_textparameter, using the model we uploaded to Elasticsearch.