Easysearch Vector Search: From Theory to Hands‑On Implementation
This article explains the principles of vector search, compares Easysearch's approximate (LSH) and exact kNN APIs, and walks through a complete hands‑on example using Stanford's 50‑dimensional GloVe embeddings to index, import, and query semantically similar words.
Vector search background
Traditional keyword matching cannot capture deep semantics. Converting texts, images, etc. into high‑dimensional vectors and measuring similarity (e.g., cosine similarity) enables more accurate semantic matching.
Easysearch kNN Retrieval API
Two search modes are provided:
Approximate Search – uses an index structure such as Locality Sensitive Hashing (LSH) to accelerate queries on large collections. Requires configuring index parameters (dimension, model type, similarity metric, L, k).
Exact Search – computes the exact similarity between the query vector and every indexed vector. Suitable for small collections but incurs higher computational cost.
Reference documentation: https://docs.infinilabs.com/easysearch/main/docs/references/search/knn_api/
Implementation steps
1. Index configuration
Define a mapping that declares a vector field, its dimension, the model, and the similarity measure. Example creates an index knn-test with a knn_dense_float_vector field my_vec using the LSH model and cosine similarity:
PUT /knn-test
{
"mappings": {
"properties": {
"word": { "type": "keyword" },
"my_vec": {
"type": "knn_dense_float_vector",
"knn": {
"dims": 50,
"model": "lsh",
"similarity": "cosine",
"L": 99,
"k": 1
}
}
}
}
}Key parameters: word – keyword field storing the term. my_vec – 50‑dimensional vector field. dims – must match the source data (e.g., GloVe 50‑d). model – lsh for fast approximate search. similarity – cosine, appropriate for word embeddings. L and k – LSH parameters controlling the number of hash tables and hash functions, affecting recall and precision.
2. Import test data
The tutorial uses Stanford's pre‑trained GloVe word vectors ( glove.6B.50d.txt). Each line contains a word followed by a 50‑dimensional vector. After downloading and extracting the file, import the data into Easysearch so that the word field stores the term and my_vec stores the vector. Import can be performed via the bulk API or a custom script. Full import code: https://t.zsxq.com/82Ra6
3. Vector retrieval
3.1 Retrieve the vector for a query term
Obtain the vector for the word "bread" with a match query:
GET /knn-test/_search
{
"query": {
"match": { "word": "bread" }
}
}Example response (truncated):
{ "my_vec": [-0.37436, -0.11959, -0.87609, -1.1217, 1.2788, ...] }3.2 Find similar words using the vector
Use the retrieved vector in a kNN query to get the top 10 most similar terms:
GET /knn-test/_search
{
"size": 10,
"_source": "word",
"query": {
"bool": {
"must": [
{
"knn_nearest_neighbors": {
"field": "my_vec",
"vec": { "values": [ -0.37436, -0.11959, -0.87609, -1.1217, 1.2788, ... ] },
"model": "lsh",
"similarity": "cosine",
"candidates": 50
}
}
]
}
}
}Parameters: size – number of results to return (10). _source – return only the word field. vec – the query vector (the vector of "bread"). candidates – size of the candidate set, influencing recall.
Sample response shows words such as "baked", "toast", "butter", and "soup", which are semantically close to "bread".
4. Exact search (optional)
For small datasets, create a mapping without model parameters and issue a kNN query with the "exact" model:
PUT /my-index
{
"mappings": {
"properties": {
"my_vec": {
"type": "knn_dense_float_vector",
"knn": { "dims": 50 }
}
}
}
} GET /my-index/_search
{
"query": {
"knn_nearest_neighbors": {
"field": "my_vec",
"vec": { "values": [ -0.37436, -0.11959 ] },
"model": "exact",
"similarity": "cosine"
}
}
}Exact search does not require LSH parameters but has O(n²) computational complexity, making it unsuitable for large collections.
Key takeaways
Configure index mapping with vector type, dimension, and (optional) LSH model.
Import high‑dimensional embeddings such as GloVe vectors.
Execute kNN queries to retrieve similar vectors; choose between approximate (LSH) and exact modes based on data scale.
Adjust LSH parameters L and k to balance recall and precision.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
