kNN retriever
Serverless Stack
A kNN retriever returns top documents from a k-nearest neighbor search (kNN).
field-
(Required, string)
The name of the vector field to search against. Must be a
dense_vectorfield with indexing enabled. query_vector-
(Required if
query_vector_builderis not defined, array offloat)Query vector. Must have the same number of dimensions as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.
query_vector_builder-
(Required if
query_vectoris not defined, query vector builder object)Defines a model to build a query vector.
k-
(Required, integer)
Number of nearest neighbors to return as top hits. This value must be fewer than or equal to
num_candidates. num_candidates-
(Required, integer)
The number of nearest neighbor candidates to consider per shard. Needs to be greater than
k, orsizeifkis omitted, and cannot exceed 10,000. Elasticsearch collectsnum_candidatesresults from each shard, then merges them to find the topkresults. Increasingnum_candidatestends to improve the accuracy of the finalkresults. Defaults toMath.min(1.5 * k, 10_000). visit_percentageStack-
(Optional, float)
The percentage of vectors to explore per shard while doing knn search with
bbq_disk. Must be between 0 and 100. 0 will default to usingnum_candidatesfor calculating the percent visited. Increasingvisit_percentagetends to improve the accuracy of the final results. Ifvisit_percentageis set forbbq_disk,num_candidatesis ignored. Defaults to ~1% per shard for every 1 million vectors. filter-
(Optional, query object or list of query objects)
Query to filter the documents that can match. The kNN search will return the top
kdocuments that also match this filter. The value can be a single query or a list of queries. Iffilteris not provided, all documents are allowed to match. similarity-
(Optional, float)
The minimum similarity required for a document to be considered a match. The similarity value calculated relates to the raw
similarityused. Not the document score. The matched documents are then scored according tosimilarityand the providedboostis applied.The
similarityparameter is the direct vector similarity calculation.l2_norm: also known as Euclidean, will include documents where the vector is within thedimsdimensional hypersphere with radiussimilaritywith origin atquery_vector.cosine,dot_product, andmax_inner_product: Only return vectors where the cosine similarity or dot-product are at least the providedsimilarity.
Read more here: knn similarity search
rescore_vectorStack- (Optional, object) Apply oversampling and rescoring to quantized vectors.
Rescoring only makes sense for quantized vectors; when quantization is not used, the original vectors are used for scoring. Rescore option will be ignored for non-quantized dense_vector fields.
oversample-
(Required, float)
Applies the specified oversample factor to
kon the approximate kNN search. The approximate kNN search will:- Retrieve
num_candidatescandidates per shard. - From these candidates, the top
k * oversamplecandidates per shard will be rescored using the original vectors. - The top
krescored candidates will be returned.
- Retrieve
See oversampling and rescoring quantized vectors for details.
The parameters query_vector and query_vector_builder cannot be used together.
GET /restaurants/_search
{
"retriever": {
"knn": {
"field": "vector",
"query_vector": [10, 22, 77],
"k": 10,
"num_candidates": 10
}
}
}
- Configuration for k-nearest neighbor (knn) search, which is based on vector similarity.
- Specifies the field name that contains the vectors.
- The query vector against which document vectors are compared in the
knnsearch. - The number of nearest neighbors to return as top hits. This value must be fewer than or equal to
num_candidates. - The size of the initial candidate set from which the final
knearest neighbors are selected.