Text similarity re-ranker retriever

Serverless Stack

The text_similarity_reranker retriever uses an NLP model to improve search results by reordering the top-k documents based on their semantic similarity to the query.

Tip

Refer to Semantic re-ranking for a high level overview of semantic re-ranking.

Prerequisites

To use text_similarity_reranker, you can rely on the preconfigured .rerank-v1-elasticsearch inference endpoint, which uses the Elastic Rerank model and serves as the default if no inference_id is provided. This model is optimized for reranking based on text similarity. If you'd like to use a different model, you can set up a custom inference endpoint for the rerank task using the Create inference API. The endpoint should be configured with a machine learning model capable of computing text similarity. Refer to the Elastic NLP model reference for a list of third-party text similarity models supported by Elasticsearch.

You have the following options:

Use the built-in Elastic Rerank cross-encoder model via the inference API’s Elasticsearch service. See this example for creating an endpoint using the Elastic Rerank model.
Use the Cohere Rerank inference endpoint with the rerank task type.
Use the Google Vertex AI inference endpoint with the rerank task type.
Upload a model to Elasticsearch with Eland using the text_similarity NLP task type.
- Then set up an Elasticsearch service inference endpoint with the rerank task type.
- Refer to the example on this page for a step-by-step guide.

Important

Scores from the re-ranking process are normalized using the following formula before returned to the user, to avoid having negative scores.

score = max(score, 0) + min(exp(score), 1)

Using the above, any initially negative scores are projected to (0, 1) and positive scores to [1, infinity). To revert back if needed, one can use:

		score = score - 1, if score >= 0
score = ln(score), if score < 0

Parameters

retriever

(Required, retriever)

The child retriever that generates the initial set of top documents to be re-ranked.

field

(Required, string)

The document field to be used for text similarity comparisons. This field should contain the text that will be evaluated against the inferenceText.

inference_id

(Optional, string)

Unique identifier of the inference endpoint created using the inference API. If you don’t specify an inference endpoint, the inference_id field defaults to .rerank-v1-elasticsearch, a preconfigured endpoint for the elasticsearch .rerank-v1 model.

inference_text

(Required, string)

The text snippet used as the basis for similarity comparison.

rank_window_size

(Optional, int)

The number of top documents to consider in the re-ranking process. Defaults to 10.

min_score

(Optional, float)

Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used.

filter

(Optional, query object or list of query objects)

Applies the specified boolean query filter to the child retriever. If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.

chunk_rescorer Stack Beta 9.2.0

(Optional, object)

Chunks and scores documents based on configured chunking settings, and only sends the best scoring chunks to the reranking model as input. This helps improve relevance when reranking long documents that would otherwise be truncated by the reranking model's token limit.

Parameters for chunk_rescorer:

size: (Optional, int)

The number of chunks to pass to the reranker. Defaults to 1.

chunking_settings: (Optional, object)

Settings for chunking text into smaller passages for scoring and reranking. Defaults to the optimal chunking settings for Elastic Rerank. Refer to the Inference API documentation for valid values for chunking_settings.

Warning

If you configure chunks larger than the reranker's token limit, the results may be truncated. This can degrade relevance significantly.

Example: Elastic Rerank

Tip

Refer to this Python notebook for an end-to-end example using Elastic Rerank.

This example demonstrates how to deploy the Elastic Rerank model and use it to re-rank search results using the text_similarity_reranker retriever.

Follow these steps:

Create an inference endpoint for the rerank task using the Create inference API.

						PUT _inference/rerank/my-elastic-rerank
					{
  "service": "elasticsearch",
  "service_settings": {
    "model_id": ".rerank-v1",
    "num_threads": 1,
    "adaptive_allocations": {
      "enabled": true,
      "min_number_of_allocations": 1,
      "max_number_of_allocations": 10
    }
  }
}
		
	

Adaptive allocations will be enabled with the minimum of 1 and the maximum of 10 allocations.

Define a text_similarity_rerank retriever:

						POST _search
					{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "How often does the moon hide the sun?"
            }
          }
        }
      },
      "field": "text",
      "inference_id": "my-elastic-rerank",
      "inference_text": "How often does the moon hide the sun?",
      "rank_window_size": 100,
      "min_score": 0.5
    }
  }
}
		
	

Example: Cohere Rerank

This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminates the need to generate and store embeddings for all indexed documents. This requires a Cohere Rerank inference endpoint that is set up for the rerank task type.

						GET /index/_search
					{
   "retriever": {
      "text_similarity_reranker": {
         "retriever": {
            "standard": {
               "query": {
                  "match_phrase": {
                     "text": "landmark in Paris"
                  }
               }
            }
         },
         "field": "text",
         "inference_id": "my-cohere-rerank-model",
         "inference_text": "Most famous landmark in Paris",
         "rank_window_size": 100,
         "min_score": 0.5
      }
   }
}
		
	

Example: Semantic re-ranking with a Hugging Face model

The following example uses the cross-encoder/ms-marco-MiniLM-L-6-v2 model from Hugging Face to rerank search results based on semantic similarity. The model must be uploaded to Elasticsearch using Eland.

Tip

Refer to the Elastic NLP model reference for a list of third party text similarity models supported by Elasticsearch.

Follow these steps to load the model and create a semantic re-ranker.

Install Eland using pip
```
python -m pip install eland[pytorch]
		
```

Upload the model to Elasticsearch using Eland. This example assumes you have an Elastic Cloud deployment and an API key. Refer to the Eland documentation for more authentication options.

		eland_import_hub_model \
  --cloud-id $CLOUD_ID \
  --es-api-key $ES_API_KEY \
  --hub-model-id cross-encoder/ms-marco-MiniLM-L-6-v2 \
  --task-type text_similarity \
  --clear-previous \
  --start
		
	

Create an inference endpoint for the rerank task

						PUT _inference/rerank/my-msmarco-minilm-model
					{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
  }
}
		
	

Define a text_similarity_rerank retriever.

						POST movies/_search
					{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "genre": "drama"
            }
          }
        }
      },
      "field": "plot",
      "inference_id": "my-msmarco-minilm-model",
      "inference_text": "films that explore psychological depths"
    }
  }
}
		
	

This retriever uses a standard match query to search the movie index for films tagged with the genre "drama". It then re-ranks the results based on semantic similarity to the text in the inference_text parameter, using the model we uploaded to Elasticsearch.