Build a Retrieval‑Augmented Generation (RAG) App with LangChain, Higress, and Elasticsearch
This tutorial walks through building a Retrieval‑Augmented Generation (RAG) system by combining LangChain for document processing, Elasticsearch’s vector store with the ELSER v2 model for semantic search, and Higress as a cloud‑native AI gateway, complete with deployment scripts, code examples, and query testing.
Retrieval‑Augmented Generation (RAG)
RAG combines traditional information retrieval with large language models (LLMs). A query first retrieves relevant documents from an external knowledge base, then feeds those documents as context to the LLM, improving answer accuracy, relevance, and timeliness.
Key components
LangChain – an open‑source framework for building LLM‑driven applications. It provides document loaders, splitters and vector‑store integrations.
Elasticsearch – a distributed search and analytics engine that supports dense and sparse vector fields for semantic search. This guide uses the built‑in ELSER v2 sparse‑vector model.
Higress – a cloud‑native API gateway (based on Istio/Envoy) that can run Wasm plugins, including the ai‑search plugin which unifies private knowledge‑base search with online search engines.
Repository
https://github.com/cr7258/hands-on-lab/tree/main/gateway/higress/rag-langchain-es
Data preprocessing
The example uses a Markdown employee handbook. MarkdownHeaderTextSplitter parses the document by headings, preserving header metadata and producing chunks that are later indexed in Elasticsearch.
from langchain_text_splitters import MarkdownHeaderTextSplitter
headers_to_split_on = [("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3")]
markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on, strip_headers=False)
docs = markdown_splitter.split_text(employee_handbook)Elasticsearch index mapping defines two fields:
PUT employee_handbook
{
"mappings": {
"properties": {
"semantic_text": { "type": "semantic_text" },
"content": { "type": "text", "copy_to": "semantic_text" }
}
}
}Deploy Elasticsearch
Start Elasticsearch and Kibana with the provided docker‑compose.yaml: docker-compose up -d Access Kibana at http://localhost:5601 (user elastic, password test123).
Enable embedding model
Allow automatic ML memory allocation:
PUT _cluster/settings
{
"persistent": { "xpack.ml.use_auto_machine_memory_percent": "true" }
}The default inference endpoint .elser-2-elasticsearch serves the ELSER v2 model.
Index documents with LangChain
from elasticsearch import Elasticsearch
from langchain_elasticsearch import ElasticsearchStore, SparseVectorStrategy
from langchain.indexes import SQLRecordManager, index
es = Elasticsearch(hosts="https://localhost:9200", basic_auth=("elastic", "test123"), verify_certs=False)
vectorstore = ElasticsearchStore(es_connection=es, index_name="employee_handbook", query_field="content", strategy=SparseVectorStrategy())
record_manager = SQLRecordManager("elasticsearch/employee_handbook", db_url="sqlite:///record_manager_cache.sql")
record_manager.create_schema()
index_result = index(docs, record_manager, vectorstore, cleanup="full")
print(index_result)The script adds 22 documents to the index and records the operation summary.
Semantic search test
LangChain’s default similarity_search does not use the RRF (Reciprocal Rank Fusion) mix required by Higress, so a custom query is defined:
def custom_query(query_body: dict, query: str):
return {
"_source": {"excludes": "semantic_text"},
"retriever": {
"rrf": {
"retrievers": [
{"standard": {"query": {"match": {"content": query}}}},
{"standard": {"query": {"semantic": {"field": "semantic_text", "query": query}}}}
]
}
}
}
results = vectorstore.similarity_search("What are the working hours in the company?", custom_query=custom_query)
print(results[0])The response correctly returns the attendance‑policy section.
Deploy Higress AI gateway
curl -sS https://higress.cn/ai-gateway/install.sh | bashAfter installation, open http://localhost:8001 to configure the provider API token (e.g., Alibaba Cloud’s Qwen‑Turbo model).
Configure the ai‑search plugin
Add Elasticsearch as a service source (replace the example IP with your own) and fill the following fields:
searchFrom:
- type: "elasticsearch"
serviceName: "elasticsearch.static"
username: "elastic"
password: "test123"
index: "employee_handbook"
contentField: "content"
semanticTextField: "semantic_text"Save the configuration; the plugin now routes queries to Elasticsearch using RRF hybrid search.
RAG query via Higress
curl 'http://localhost:8080/v1/chat/completions' \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-turbo",
"messages": [{"role": "user", "content": "What are the working hours in the company?"}]
}'The gateway returns the correct hours (9 AM – 6 PM). After editing the source Markdown to change the hours to 8 AM – 5 PM and re‑running the indexing script, the same query returns the updated schedule, demonstrating automatic knowledge‑base refresh.
References
LangChain Elasticsearch vector store: https://python.langchain.com/docs/integrations/vectorstores/elasticsearch
How to split Markdown by headers: https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/
LangChain indexing API: https://python.langchain.com/docs/how_to/indexing/
Semantic search with native match, knn and sparse_vector: https://www.elastic.co/search-labs/blog/semantic-search-match-knn-sparse-vector
Hybrid search with semantic_text: https://www.elastic.co/docs/solutions/search/hybrid-semantic-text
Enhancing relevance with sparse vectors: https://www.elastic.co/search-labs/blog/elasticsearch-sparse-vector-boosting-personalization
What is RAG (retrieval augmented generation)?: https://www.elastic.co/what-is/retrieval-augmented-generation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
