Boosting Elasticsearch with Generative AI: Relevance Engine & Vector Search
This article explores the rise of generative AI, outlines popular models like ChatGPT, DALL‑E, and Google Bard, examines their limitations, and then delves into Elasticsearch’s Relevance Engine and vector capabilities, demonstrating how to store, index, and query dense embeddings with practical code examples.
Preface
In 2023 the hottest internet buzzword was "generative AI". Over the past year, large models have been embraced across industries, and the author revisits the Elasticsearch website, noting its refreshed style and renewed interest.
Elasticsearch, once known primarily as a full‑text search engine, is now expanding into vector search and generative AI integration.
1. What Is Generative AI
Generative AI is a branch of artificial intelligence that creates original content using large language models, neural networks, and machine learning. These models learn underlying structures from massive datasets and can generate text, images, code, music, translations, and more based on user prompts.
Popular Generative AI Models
ChatGPT – Developed by OpenAI, this large language model was released in November 2022 and quickly gained popularity for its conversational abilities, code generation, and humor. The initial free version was trained on over 45 TB of text data, and later versions (ChatGPT‑4) were launched in 2023.
DALL‑E 2 – Also from OpenAI, DALL‑E generates images from text prompts using a combination of GANs and variational auto‑encoders, competing with tools like MidJourney and Adobe Firefly.
Google Bard – Built on Google’s LaMDA and later PaLM 2 models, Bard offers capabilities similar to ChatGPT, including coding, math solving, writing, and search integration.
Applications in E‑commerce
Generative AI can personalize product recommendations, streamline shopping experiences, and power AI‑driven chatbots for better customer service.
In finance, it aids market trend prediction, portfolio optimization, fraud protection, algorithmic trading, and synthetic data generation for risk analysis.
Limitations of Generative AI Models
Domain Knowledge/Accuracy: Models may lack specific industry knowledge, leading to hallucinations and outdated information.
Privacy and Security: Handling proprietary or personal data raises privacy concerns.
Scale and Cost: Large models require substantial compute resources, making them expensive for many enterprises.
Staleness: Training data is frozen at a point in time, so models may provide outdated answers.
Hallucinations: Models can fabricate plausible‑looking but false facts.
2. Elasticsearch Relevance Engine
Elasticsearch Relevance Engine (ESRE) combines AI best practices with Elastic’s text search, offering a unified API to integrate large language models (LLM) and provide highly relevant AI‑driven search results.
Advanced relevance ranking such as BM25f.
Vector database capabilities for dense embeddings.
Various NLP tasks and models for text processing.
Support for custom transformer models.
Integration with third‑party models like OpenAI’s GPT‑3/4.
Built‑in Learned Sparse Encoder for ML‑supported search without training.
Reciprocal Rank Fusion (RRF) for hybrid sparse‑dense ranking.
Integration with tools like LangChain for complex pipelines.
3. Use Cases for Elasticsearch Vector Store
Elasticsearch supports traditional bag‑of‑words/BM25 retrieval as well as KNN and ANN vector search (KNN in version 8.11).
It mitigates LLM challenges by providing contextual data, supporting third‑party models, and offering a built‑in sparse encoder.
Advantages include hybrid search, massive data storage, and high‑performance real‑time queries.
Method 1: Store Vectors in Elasticsearch and Query with LLM
Users ingest documents and their embeddings into Elasticsearch, then use KNN search to retrieve the most similar vectors, passing the results to an LLM for answer generation.
Method 2: Elasticsearch Relevance Engine + LLM
From version 8.8, ESRE allows direct ingestion and querying of LLM models via the familiar search API, using RRF for hybrid ranking and reducing operational complexity.
Method 3: Built‑in Sparse Encoder Model
Elastic’s Learned Sparse Encoder outperforms SPLADE and resolves lexical mismatches; it can be accessed via the text_expansion query.
4. Elasticsearch Vector Retrieval
Elasticsearch provides three core capabilities as a vector database: storing embeddings, efficient nearest‑neighbor search, and converting text to vector representations.
5. Elasticsearch Vector Search Example
Creating and querying a vector index:
PUT /image-index
{
"mappings": {
"properties": {
"image-vector": {
"type": "dense_vector",
"dims": 3,
"index": true,
"similarity": "l2_norm"
},
"title-vector": {
"type": "dense_vector",
"dims": 5,
"index": true,
"similarity": "l2_norm"
},
"title": { "type": "text" },
"file-type": { "type": "keyword" }
}
}
}Bulk inserting data:
POST /image-index/_bulk?refresh=true
{ "index": { "_id": "1" } }
{ "image-vector": [1, 5, -20], "title-vector": [12, 50, -10, 0, 1], "title": "moose family", "file-type": "jpg" }
{ "index": { "_id": "2" } }
{ "image-vector": [42, 8, -15], "title-vector": [25, 1, 4, -12, 2], "title": "alpine lake", "file-type": "png" }
{ "index": { "_id": "3" } }
{ "image-vector": [15, 11, 23], "title-vector": [1, 5, 25, 50, 20], "title": "full moon", "file-type": "jpg" }KNN search query:
POST /image-index/_search
{
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 100
},
"fields": [ "title", "file-type" ]
}Sample response (truncated):
{
"hits": {
"hits": [
{
"_source": { "title": "moose family", "file-type": "jpg" }
},
{
"_source": { "title": "full moon", "file-type": "jpg" }
},
{
"_source": { "title": "alpine lake", "file-type": "png" }
}
]
}
}6. Conclusion
Elasticsearch has evolved dramatically, adopting a serverless architecture in 2024 that separates storage and compute, leverages cloud‑native services, and simplifies operations while retaining powerful search and vector capabilities.
7. Glossary
RRF : Reciprocal Rank Fusion, a hybrid ranking technique that merges results from multiple search methods.
ANN : Artificial Neural Network, a model inspired by the human brain used in AI.
KNN : k‑Nearest Neighbors, an algorithm for finding the k most similar items in a dataset.
8. References
https://github.com/elastic/elasticsearch-labs
https://www.elastic.co/guide/en/elasticsearch/reference/8.11/knn-search-api.html
https://www.elastic.co/cn/blog/demystifying-chatgpt-methods-building-ai-search#where-does-elasticsearch-fit?
https://www.elastic.co/cn/blog/may-2023-launch-announcement
https://www.elastic.co/cn/blog/what-is/generative-ai#whats-next-for-generative-ai
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
