Understanding Vector Storage and Optimization in Elasticsearch 8.16.1
The article explains how Elasticsearch 8.16.1 stores dense and sparse vectors using various file extensions, compares flat and HNSW index formats, shows how disabling doc‑values removes redundant column‑store copies, and demonstrates scalar and binary quantization—including a quantization‑only mode—that can cut storage to roughly 9 percent while preserving search accuracy.
This article provides an in‑depth analysis of how Elasticsearch 8.16.1 stores and optimizes vector data. It starts by describing the new vector field mapping types ( sparse_vector and dense_vector ) introduced in ES8 and explains why sparse_vector is rarely used.
1. Storage Overview
The source code introduces several file extensions for vector data (e.g., .vec , .vex , .vem , .veq , .vemq , .veb , .vemb ). The article lists the enum definition from Lucene:
public enum LuceneFilesExtensions {
...
// kNN vectors format
VEC("vec", "Vector Data", false, true),
VEX("vex", "Vector Index", false, true),
VEM("vem", "Vector Metadata", true, false),
VEMF("vemf", "Flat Vector Metadata", true, false),
VEMQ("vemq", "Scalar Quantized Vector Metadata", true, false),
VEQ("veq", "Scalar Quantized Vector Data", false, true),
VEMB("vemb", "Binarized Vector Metadata", true, false),
VEB("veb", "Binarized Vector Data", false, true);
...
}Two main index types are discussed:
Flat index – stores all vectors in a continuous array ( .vec and .vemf files).
HNSW index – builds a hierarchical graph for approximate k‑NN search ( .vex and .vem files) and also keeps a Flat copy.
2. Index Variants
Examples of mapping configurations are shown for each index type. For a Flat index:
{
"titleVector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine",
"index_options": {"type": "flat"}
}
}For an HNSW index:
{
"titleVector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine",
"index_options": {"type": "hnsw"}
}
}For scalar quantized (int8) HNSW:
{
"titleVector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine",
"index_options": {"type": "int8_hnsw"}
}
}For binary‑quantized HNSW (bbq_hnsw):
{
"titleVector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine",
"index_options": {"type": "bbq_hnsw"}
}
}The article shows directory listings (using tree -h ) that illustrate how file sizes change with each index type, highlighting the storage impact of the different formats.
3. Row‑Store (Column‑Store) Trimming
It is observed that vector values are stored twice: once in the column‑store ( .fdt ) as JSON strings and again in the vector‑specific files ( .vec , .veq , etc.). By disabling doc‑value support for dense_vector fields and exposing the vector data through docvalue_fields , the redundant .fdt storage can be eliminated, saving 70‑90% of disk space.
Key code changes include overriding DenseVectorFieldType.docValueFormat to throw an exception for unsupported operations and implementing a custom DenseVectorDocValueFormat that returns the vector from the appropriate .vec or .veq file.
4. Quantization Techniques
Two quantization methods are covered:
Scalar quantization (int8, int4) – stores vectors as 8‑bit integers in .veq files while keeping the original .vec for re‑hydration.
Binary quantization (bbq_hnsw) – reduces vectors to 1‑bit representations stored in .veb , achieving up to 97% storage reduction.
Source code snippets for the custom formats ( ES814HnswScalarQuantizedVectorsFormat , ES816HnswBinaryQuantizedVectorsFormat ) and their writers/readers are provided, demonstrating how the quantized data is written and read.
5. Quantization‑Only Storage (Re‑hydrate)
The article introduces a new format ( VPackHnswScalarQuantizedOnlyVectorsFormat ) that discards the original .vec file entirely, keeping only the quantized representation. This further reduces disk usage to about 9% of the original size while preserving search accuracy (recall loss is only a few percent).
Example mapping for this mode:
{
"titleVector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine",
"index_options": {"type": "int8_only_hnsw"}
}
}Benchmark results show storage savings of 91% for pure vector workloads and 90% for mixed workloads when combining row‑store trimming with quantization‑only storage.
6. Contributions and Outlook
The author lists several community contributions (PRs #108470, #114484, #114407) that added doc‑value support, quantization formats, and bug fixes. Tencent Cloud Elasticsearch integrates these features in its v‑pack vector enhancement plugin, available in version 8.16.1.
Future work includes further reducing the recall impact of quantization and extending the re‑hydrate logic to other vector index types.
Tencent Technical Engineering
Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.