Artificial Intelligence 11 min read

Smart Q&A Knowledge Base Powered by Qwen2.5‑14B and Elasticsearch RAG

This article details a smart Q&A knowledge‑base system that integrates the Qwen2.5‑14B large language model with Elasticsearch vector search via RAG, covering data ingestion with FSCrawler, Chinese sentence embedding, Gradio UI, performance tests on a 483‑page book, architecture diagrams, code walkthroughs, and suggested enhancements.

Mingyi World Elasticsearch

Mar 6, 2025

Smart Q&A Knowledge Base Powered by Qwen2.5‑14B and Elasticsearch RAG

1. Test Results

The system imports the entire 483‑page Chinese book "一本书讲透 Elasticsearch" (≈638 k characters) into the knowledge base. Repeated tests show the pipeline can handle diverse user queries, quickly locate relevant passages, and generate coherent, accurate answers. For example, when a user asks a specific question about the book, the system retrieves the exact paragraph and produces a natural‑language response.

These results stem from Elasticsearch’s efficient retrieval combined with the contextual understanding of Qwen2.5‑14B.

2. Environment Requirements

Ollama : manages and runs the Qwen2.5‑14B model.

C:\Users\Administrator>ollama list
NAME            ID               SIZE   MODIFIED
qwen2.5:14b     7cdf5a0187d5     9.0 GB 3 months ago
qwen2:72b      14066dfa503f    41 GB  7 months ago
qwen2:7b       e0d4e1163c58    4.4 GB 7 months ago

FSCrawler 2.10 : crawls local files (PDF, DOC, XLS, PPT, TXT) and indexes them into Elasticsearch.

Elasticsearch 8.15.3 : core search engine that stores vectorized document data.

Kibana 8.15.3 : visual monitoring and management of Elasticsearch indices.

SentenceModel('shibing624/text2vec-base-chinese') : Chinese sentence‑embedding model that converts queries and documents into vectors for semantic search.

Gradio : provides a web‑based interactive UI for users to submit queries and view answers.

3. System Architecture

The architecture consists of five vertical layers:

Gradio Web Interface : top‑level entry point where users type questions.

Qwen2.5‑14B : the large language model that receives the query (or the query combined with retrieved context) and generates the final answer. The model can be swapped for a DeepSeek variant.

Vectorization Layer : uses shibing624/text2vec-base-chinese to embed text into dense vectors.

Elasticsearch Search : stores the vectors and performs similarity search to retrieve relevant documents.

FSCrawler Data Ingestion : scans local documents and pushes them into Elasticsearch.

Data flows from the Gradio UI down through the LLM, vectorization, and Elasticsearch, then back up as a generated answer. The following diagram (image) illustrates the component connections.

3.1 Data Processing Flow

1) Input : user query and private local documents (PDF, DOC, etc.).

2) Elasticsearch : core module containing a vector database and retrieval engine.

3) Qwen2.5 LLM : receives the query and retrieved passages, then generates a natural answer.

4) Output & Validation : the system returns the precise answer and optionally validates it; a public API is also exposed.

4. Code Walkthrough

4.1 Document Ingestion

FSCrawler indexes local files into Elasticsearch: fscrawler --config_dir /path/to/config job_name The configuration specifies the document paths and the target Elasticsearch index.

4.2 Vectorization

Python code uses the sentence‑embedding model to encode each document and store the vector:

from text2vec import SentenceModel
model = SentenceModel('shibing624/text2vec-base-chinese')
for doc in documents:
    vector = model.encode(doc['text'])
    es.index(index='knowledge_base', body={'text': doc['text'], 'vector': vector})

4.3 Retrieval & Search

User queries are vectorized and combined with keyword matching for similarity search in Elasticsearch.

4.4 Answer Generation

The retrieved documents are concatenated into a prompt and fed to the LLM via Ollama:

from ollama import Client
ollama = Client()
context = "
".join([doc['_source']['text'] for doc in results['hits']['hits']])
prompt = f"根据以下内容回答问题：
{context}
问题：{query}"
response = ollama.generate(model='qwen2.5:14b', prompt=prompt)
answer = response['text']

4.5 Gradio Interface

A minimal Gradio app wraps the pipeline:

import gradio as gr

def qa_system(query):
    # query processing, retrieval, generation logic
    return answer

interface = gr.Interface(fn=qa_system, inputs="text", outputs="text")
interface.launch()

5. Future Improvements

Model Optimization : fine‑tune Qwen2.5‑14B on domain‑specific data or compare with a DeepSeek model.

Vectorization Enhancements : experiment with alternative Chinese embedding models or further fine‑tune text2vec for better semantic accuracy.

Document Granularity : split the source book into smaller sections (e.g., per chapter or subsection) before indexing to potentially improve retrieval precision.

Conclusion

The Qwen2.5‑14B + Elasticsearch RAG pipeline demonstrates how retrieval‑augmented generation can provide fast, accurate answers from large private document collections. The clear modular design—from data ingestion to UI—and the demonstrated performance highlight the strong potential of RAG in knowledge‑management scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch RAG vector search Gradio Chinese Embedding FSCrawler Qwen2.5-14B

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.