Build a Retrieval‑Augmented Generation (RAG) App with LangChain, Higress, and Elasticsearch

This tutorial walks through building a Retrieval‑Augmented Generation (RAG) system by combining LangChain for document processing, Elasticsearch’s vector store with the ELSER v2 model for semantic search, and Higress as a cloud‑native AI gateway, complete with deployment scripts, code examples, and query testing.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Build a Retrieval‑Augmented Generation (RAG) App with LangChain, Higress, and Elasticsearch

Retrieval‑Augmented Generation (RAG)

RAG combines traditional information retrieval with large language models (LLMs). A query first retrieves relevant documents from an external knowledge base, then feeds those documents as context to the LLM, improving answer accuracy, relevance, and timeliness.

Key components

LangChain – an open‑source framework for building LLM‑driven applications. It provides document loaders, splitters and vector‑store integrations.

Elasticsearch – a distributed search and analytics engine that supports dense and sparse vector fields for semantic search. This guide uses the built‑in ELSER v2 sparse‑vector model.

Higress – a cloud‑native API gateway (based on Istio/Envoy) that can run Wasm plugins, including the ai‑search plugin which unifies private knowledge‑base search with online search engines.

Repository

https://github.com/cr7258/hands-on-lab/tree/main/gateway/higress/rag-langchain-es

Data preprocessing

The example uses a Markdown employee handbook. MarkdownHeaderTextSplitter parses the document by headings, preserving header metadata and producing chunks that are later indexed in Elasticsearch.

from langchain_text_splitters import MarkdownHeaderTextSplitter

headers_to_split_on = [("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3")]
markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on, strip_headers=False)

docs = markdown_splitter.split_text(employee_handbook)

Elasticsearch index mapping defines two fields:

PUT employee_handbook
{
  "mappings": {
    "properties": {
      "semantic_text": { "type": "semantic_text" },
      "content": { "type": "text", "copy_to": "semantic_text" }
    }
  }
}

Deploy Elasticsearch

Start Elasticsearch and Kibana with the provided docker‑compose.yaml: docker-compose up -d Access Kibana at http://localhost:5601 (user elastic, password test123).

Enable embedding model

Allow automatic ML memory allocation:

PUT _cluster/settings
{
  "persistent": { "xpack.ml.use_auto_machine_memory_percent": "true" }
}

The default inference endpoint .elser-2-elasticsearch serves the ELSER v2 model.

Index documents with LangChain

from elasticsearch import Elasticsearch
from langchain_elasticsearch import ElasticsearchStore, SparseVectorStrategy
from langchain.indexes import SQLRecordManager, index

es = Elasticsearch(hosts="https://localhost:9200", basic_auth=("elastic", "test123"), verify_certs=False)
vectorstore = ElasticsearchStore(es_connection=es, index_name="employee_handbook", query_field="content", strategy=SparseVectorStrategy())

record_manager = SQLRecordManager("elasticsearch/employee_handbook", db_url="sqlite:///record_manager_cache.sql")
record_manager.create_schema()

index_result = index(docs, record_manager, vectorstore, cleanup="full")
print(index_result)

The script adds 22 documents to the index and records the operation summary.

Semantic search test

LangChain’s default similarity_search does not use the RRF (Reciprocal Rank Fusion) mix required by Higress, so a custom query is defined:

def custom_query(query_body: dict, query: str):
    return {
        "_source": {"excludes": "semantic_text"},
        "retriever": {
            "rrf": {
                "retrievers": [
                    {"standard": {"query": {"match": {"content": query}}}},
                    {"standard": {"query": {"semantic": {"field": "semantic_text", "query": query}}}}
                ]
            }
        }
    }

results = vectorstore.similarity_search("What are the working hours in the company?", custom_query=custom_query)
print(results[0])

The response correctly returns the attendance‑policy section.

Deploy Higress AI gateway

curl -sS https://higress.cn/ai-gateway/install.sh | bash

After installation, open http://localhost:8001 to configure the provider API token (e.g., Alibaba Cloud’s Qwen‑Turbo model).

Configure the ai‑search plugin

Add Elasticsearch as a service source (replace the example IP with your own) and fill the following fields:

searchFrom:
- type: "elasticsearch"
  serviceName: "elasticsearch.static"
  username: "elastic"
  password: "test123"
  index: "employee_handbook"
  contentField: "content"
  semanticTextField: "semantic_text"

Save the configuration; the plugin now routes queries to Elasticsearch using RRF hybrid search.

RAG query via Higress

curl 'http://localhost:8080/v1/chat/completions' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen-turbo",
    "messages": [{"role": "user", "content": "What are the working hours in the company?"}]
  }'

The gateway returns the correct hours (9 AM – 6 PM). After editing the source Markdown to change the hours to 8 AM – 5 PM and re‑running the indexing script, the same query returns the updated schedule, demonstrating automatic knowledge‑base refresh.

References

LangChain Elasticsearch vector store: https://python.langchain.com/docs/integrations/vectorstores/elasticsearch

How to split Markdown by headers: https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/

LangChain indexing API: https://python.langchain.com/docs/how_to/indexing/

Semantic search with native match, knn and sparse_vector: https://www.elastic.co/search-labs/blog/semantic-search-match-knn-sparse-vector

Hybrid search with semantic_text: https://www.elastic.co/docs/solutions/search/hybrid-semantic-text

Enhancing relevance with sparse vectors: https://www.elastic.co/search-labs/blog/elasticsearch-sparse-vector-boosting-personalization

What is RAG (retrieval augmented generation)?: https://www.elastic.co/what-is/retrieval-augmented-generation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAILangChainRAGvector searchHigress
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.