Master Elasticsearch dense_vector: definition, usage, and kNN search guide

This article explains Elasticsearch's dense_vector field for storing dense vectors, covering its definition, how to define and index vectors, kNN search methods (brute‑force and approximate with HNSW), similarity options, quantization strategies, bit‑vector support, key parameters, and how to update mappings.

Mingyi World Elasticsearch
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Master Elasticsearch dense_vector: definition, usage, and kNN search guide

1. What is dense_vector?

In Elasticsearch, dense_vector is a field type designed to store a list of numeric vectors (e.g., [0.5, 10, 6]) generated by machine‑learning models. It is not meant for aggregation or sorting like keyword or text, but specifically for vector similarity search such as k‑nearest‑neighbor (kNN) queries used in recommendation systems, image search, or semantic similarity in NLP.

2. Defining and using dense_vector

2.1 Basic definition

PUT my-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3 // vector has 3 dimensions
      },
      "my_text": {
        "type": "keyword"
      }
    }
  }
}
my_vector

is a dense_vector with dims set to 3. dims is mandatory and cannot exceed 4096 (2048 before Elasticsearch 8.10).

2.2 Adding data

PUT my-index/_doc/1
{
  "my_text": "text1",
  "my_vector": [0.5, 10, 6]
}

PUT my-index/_doc/2
{
  "my_text": "text2",
  "my_vector": [-0.5, 10, 10]
}

Each document can store only one vector in the my_vector field. Numbers are floats by default and must match the defined dims.

3. kNN search: putting vectors to work

The core use of dense_vector is kNN search, i.e., finding the k most similar documents to a query vector. Elasticsearch supports two approaches:

Brute‑Force kNN – use script_score to scan all documents and compute similarity on the fly. Simple but slow on large datasets.

Approximate kNN (indexed) – build an index (default enabled) using structures such as HNSW, then query with the knn option for fast retrieval.

3.1 Setting similarity

Similarity is configured with the similarity parameter; the default is cosine (cosine similarity). Example using dot‑product:

PUT my-index-2
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3,
        "similarity": "dot_product" // use dot product for similarity
      }
    }
  }
}

Supported similarity options: l2_norm – Euclidean distance. dot_product – vector dot product. cosine – cosine similarity (default). max_inner_product – maximum inner product.

3.2 Comparison of similarity options

l2_norm

: suitable for precise spatial distance calculations (e.g., geographic matching); sensitive to vector length. dot_product: efficient for normalized vectors or byte vectors; high performance. cosine: preferred for semantic similarity of text or image features. max_inner_product: useful when vector length carries meaning, such as certain recommendation systems.

3.3 Disabling the index

If fast kNN is not needed, you can turn off indexing to save resources:

PUT my-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3,
        "index": false // disable index
      }
    }
  }
}

With indexing disabled, only brute‑force search works; use with caution.

4. Quantization: balancing memory and speed

Indexing vectors accelerates search but consumes memory. Elasticsearch offers quantization to compress vectors:

int8 : 1 byte per dimension, ~75% memory reduction, slight accuracy loss.

int4 : 4 bits per dimension, ~87% reduction, larger accuracy loss; requires even dimensions.

bbq (experimental): 1 bit per dimension, ~96% reduction, highest accuracy loss; requires dimensions > 64 and may need over‑sampling and re‑ranking.

Example of int8 quantization with HNSW:

PUT my-byte-quantized-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 3,
        "index": true,
        "index_options": {
          "type": "int8_hnsw" // int8 quantization + HNSW
        }
      }
    }
  }
}

Quantized vectors sacrifice some accuracy, but query‑time parameters (e.g., over‑sampling) can improve result quality.

5. Bit vectors: a tool for ultra‑high‑dimensional scenarios

Besides float, dense_vector also supports byte and bit element types. The bit type stores each dimension as a single bit, ideal for extremely high‑dimensional data.

5.1 Defining a bit vector

PUT my-bit-vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 40, // dimensions must be a multiple of 8
        "element_type": "bit"
      }
    }
  }
}

5.2 Adding data

POST /my-bit-vectors/_bulk?refresh
{ "index": { "_id": "1" } }
{ "my_vector": [127, -127, 0, 1, 42] }
{ "index": { "_id": "2" } }
{ "my_vector": "8100012a7f" }

5.3 Searching bit vectors

POST /my-bit-vectors/_search?filter_path=hits.hits
{
  "query": {
    "knn": {
      "query_vector": [127, -127, 0, 1, 42],
      "field": "my_vector"
    }
  }
}

Similarity for bit vectors is measured with Hamming distance; higher scores indicate greater similarity.

6. Parameter reference

element_type

: float (default), byte, or bit. dims: number of dimensions, maximum 4096. index: whether to enable indexing (default true). similarity: similarity algorithm (default cosine). index_options: algorithm configuration, e.g., type can be hnsw, int8_hnsw, flat; m (neighbor count, default 16); ef_construction (candidate size, default 100).

These parameters can be tuned to balance accuracy and performance.

7. Updating the field type

Use the Update Mapping API to change index_options.type, for example from flat to int4_hnsw. Updating does not affect already indexed documents; re‑indexing or a force merge is required for full effect.

PUT my-index-000001/_mapping
{
  "properties": {
    "text_embedding": {
      "type": "dense_vector",
      "dims": 384,
      "index_options": {
        "type": "int4_hnsw"
      }
    }
  }
}

8. Summary

dense_vector

is a powerful Elasticsearch field for handling vector data. It supports flexible definitions, multiple similarity metrics, quantization, and indexing options, enabling both brute‑force and approximate kNN searches as well as ultra‑high‑dimensional bit‑vector scenarios.

9. References

Elasticsearch official documentation – Dense Vector Field Type: https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html

k‑Nearest Neighbor (kNN) Search: https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

quantizationElasticsearchvector searchkNNdense_vectorsimilaritybit vectors
Mingyi World Elasticsearch
Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.