Artificial Intelligence 9 min read

Easysearch Vector Search: From Theory to Hands‑On Implementation

This article explains the principles of vector search, compares Easysearch's approximate (LSH) and exact kNN APIs, and walks through a complete hands‑on example using Stanford's 50‑dimensional GloVe embeddings to index, import, and query semantically similar words.

Mingyi World Elasticsearch

May 16, 2025

Easysearch Vector Search: From Theory to Hands‑On Implementation

Vector search background

Traditional keyword matching cannot capture deep semantics. Converting texts, images, etc. into high‑dimensional vectors and measuring similarity (e.g., cosine similarity) enables more accurate semantic matching.

Easysearch kNN Retrieval API

Two search modes are provided:

Approximate Search – uses an index structure such as Locality Sensitive Hashing (LSH) to accelerate queries on large collections. Requires configuring index parameters (dimension, model type, similarity metric, L, k).

Exact Search – computes the exact similarity between the query vector and every indexed vector. Suitable for small collections but incurs higher computational cost.

Reference documentation: https://docs.infinilabs.com/easysearch/main/docs/references/search/knn_api/

Implementation steps

1. Index configuration

Define a mapping that declares a vector field, its dimension, the model, and the similarity measure. Example creates an index knn-test with a knn_dense_float_vector field my_vec using the LSH model and cosine similarity:

PUT /knn-test
{
  "mappings": {
    "properties": {
      "word": { "type": "keyword" },
      "my_vec": {
        "type": "knn_dense_float_vector",
        "knn": {
          "dims": 50,
          "model": "lsh",
          "similarity": "cosine",
          "L": 99,
          "k": 1
        }
      }
    }
  }
}

Key parameters: word – keyword field storing the term. my_vec – 50‑dimensional vector field. dims – must match the source data (e.g., GloVe 50‑d). model – lsh for fast approximate search. similarity – cosine, appropriate for word embeddings. L and k – LSH parameters controlling the number of hash tables and hash functions, affecting recall and precision.

2. Import test data

The tutorial uses Stanford's pre‑trained GloVe word vectors ( glove.6B.50d.txt). Each line contains a word followed by a 50‑dimensional vector. After downloading and extracting the file, import the data into Easysearch so that the word field stores the term and my_vec stores the vector. Import can be performed via the bulk API or a custom script. Full import code: https://t.zsxq.com/82Ra6

3. Vector retrieval

3.1 Retrieve the vector for a query term

Obtain the vector for the word "bread" with a match query:

GET /knn-test/_search
{
  "query": {
    "match": { "word": "bread" }
  }
}

Example response (truncated):

{ "my_vec": [-0.37436, -0.11959, -0.87609, -1.1217, 1.2788, ...] }

3.2 Find similar words using the vector

Use the retrieved vector in a kNN query to get the top 10 most similar terms:

GET /knn-test/_search
{
  "size": 10,
  "_source": "word",
  "query": {
    "bool": {
      "must": [
        {
          "knn_nearest_neighbors": {
            "field": "my_vec",
            "vec": { "values": [ -0.37436, -0.11959, -0.87609, -1.1217, 1.2788, ... ] },
            "model": "lsh",
            "similarity": "cosine",
            "candidates": 50
          }
        }
      ]
    }
  }
}

Parameters: size – number of results to return (10). _source – return only the word field. vec – the query vector (the vector of "bread"). candidates – size of the candidate set, influencing recall.

Sample response shows words such as "baked", "toast", "butter", and "soup", which are semantically close to "bread".

4. Exact search (optional)

For small datasets, create a mapping without model parameters and issue a kNN query with the "exact" model:

PUT /my-index
{
  "mappings": {
    "properties": {
      "my_vec": {
        "type": "knn_dense_float_vector",
        "knn": { "dims": 50 }
      }
    }
  }
}

GET /my-index/_search
{
  "query": {
    "knn_nearest_neighbors": {
      "field": "my_vec",
      "vec": { "values": [ -0.37436, -0.11959 ] },
      "model": "exact",
      "similarity": "cosine"
    }
  }
}

Exact search does not require LSH parameters but has O(n²) computational complexity, making it unsuitable for large collections.

Key takeaways

Configure index mapping with vector type, dimension, and (optional) LSH model.

Import high‑dimensional embeddings such as GloVe vectors.

Execute kNN queries to retrieve similar vectors; choose between approximate (LSH) and exact modes based on data scale.

Adjust LSH parameters L and k to balance recall and precision.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

vector search kNN Cosine Similarity GloVe Easysearch Approximate Search Exact Search

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.