kNN retriever

Serverless Stack

A kNN retriever returns top documents from a k-nearest neighbor search (kNN).

Parameters

field

(Required, string)

The name of the vector field to search against. Must be a dense_vector field with indexing enabled.

query_vector

(Required if query_vector_builder is not defined, array of float)

Query vector. Must have the same number of dimensions as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.

query_vector_builder

(Required if query_vector is not defined, query vector builder object)

Defines a model to build a query vector.

k

(Required, integer)

Number of nearest neighbors to return as top hits. This value must be fewer than or equal to num_candidates.

num_candidates

(Required, integer)

The number of nearest neighbor candidates to consider per shard. Needs to be greater than k, or size if k is omitted, and cannot exceed 10,000. Elasticsearch collects num_candidates results from each shard, then merges them to find the top k results. Increasing num_candidates tends to improve the accuracy of the final k results. Defaults to Math.min(1.5 * k, 10_000).

visit_percentage Stack 9.2.0

(Optional, float)

The percentage of vectors to explore per shard while doing knn search with bbq_disk. Must be between 0 and 100. 0 will default to using num_candidates for calculating the percent visited. Increasing visit_percentage tends to improve the accuracy of the final results. If visit_percentage is set for bbq_disk, num_candidates is ignored. Defaults to ~1% per shard for every 1 million vectors.

filter

(Optional, query object or list of query objects)

Query to filter the documents that can match. The kNN search will return the top k documents that also match this filter. The value can be a single query or a list of queries. If filter is not provided, all documents are allowed to match.

similarity

(Optional, float)

The minimum similarity required for a document to be considered a match. The similarity value calculated relates to the raw similarity used. Not the document score. The matched documents are then scored according to similarity and the provided boost is applied.

The similarity parameter is the direct vector similarity calculation.

l2_norm: also known as Euclidean, will include documents where the vector is within the dims dimensional hypersphere with radius similarity with origin at query_vector.
cosine, dot_product, and max_inner_product: Only return vectors where the cosine similarity or dot-product are at least the provided similarity.

Restrictions

The parameters query_vector and query_vector_builder cannot be used together.

Example

						GET /restaurants/_search
					{
  "retriever": {
    "knn": {
      "field": "vector",
      "query_vector": [10, 22, 77],
      "k": 10,
      "num_candidates": 10
    }
  }
}
		
	

Configuration for k-nearest neighbor (knn) search, which is based on vector similarity.
Specifies the field name that contains the vectors.
The query vector against which document vectors are compared in the knn search.
The number of nearest neighbors to return as top hits. This value must be fewer than or equal to num_candidates.
The size of the initial candidate set from which the final k nearest neighbors are selected.