Smart Q&A Knowledge Base Powered by Qwen2.5‑14B and Elasticsearch RAG
This article details a smart Q&A knowledge‑base system that integrates the Qwen2.5‑14B large language model with Elasticsearch vector search via RAG, covering data ingestion with FSCrawler, Chinese sentence embedding, Gradio UI, performance tests on a 483‑page book, architecture diagrams, code walkthroughs, and suggested enhancements.
1. Test Results
The system imports the entire 483‑page Chinese book "一本书讲透 Elasticsearch" (≈638 k characters) into the knowledge base. Repeated tests show the pipeline can handle diverse user queries, quickly locate relevant passages, and generate coherent, accurate answers. For example, when a user asks a specific question about the book, the system retrieves the exact paragraph and produces a natural‑language response.
These results stem from Elasticsearch’s efficient retrieval combined with the contextual understanding of Qwen2.5‑14B.
2. Environment Requirements
Ollama : manages and runs the Qwen2.5‑14B model.
C:\Users\Administrator>ollama list
NAME ID SIZE MODIFIED
qwen2.5:14b 7cdf5a0187d5 9.0 GB 3 months ago
qwen2:72b 14066dfa503f 41 GB 7 months ago
qwen2:7b e0d4e1163c58 4.4 GB 7 months agoFSCrawler 2.10 : crawls local files (PDF, DOC, XLS, PPT, TXT) and indexes them into Elasticsearch.
Elasticsearch 8.15.3 : core search engine that stores vectorized document data.
Kibana 8.15.3 : visual monitoring and management of Elasticsearch indices.
SentenceModel('shibing624/text2vec-base-chinese') : Chinese sentence‑embedding model that converts queries and documents into vectors for semantic search.
Gradio : provides a web‑based interactive UI for users to submit queries and view answers.
3. System Architecture
The architecture consists of five vertical layers:
Gradio Web Interface : top‑level entry point where users type questions.
Qwen2.5‑14B : the large language model that receives the query (or the query combined with retrieved context) and generates the final answer. The model can be swapped for a DeepSeek variant.
Vectorization Layer : uses shibing624/text2vec-base-chinese to embed text into dense vectors.
Elasticsearch Search : stores the vectors and performs similarity search to retrieve relevant documents.
FSCrawler Data Ingestion : scans local documents and pushes them into Elasticsearch.
Data flows from the Gradio UI down through the LLM, vectorization, and Elasticsearch, then back up as a generated answer. The following diagram (image) illustrates the component connections.
3.1 Data Processing Flow
1) Input : user query and private local documents (PDF, DOC, etc.).
2) Elasticsearch : core module containing a vector database and retrieval engine.
3) Qwen2.5 LLM : receives the query and retrieved passages, then generates a natural answer.
4) Output & Validation : the system returns the precise answer and optionally validates it; a public API is also exposed.
4. Code Walkthrough
4.1 Document Ingestion
FSCrawler indexes local files into Elasticsearch: fscrawler --config_dir /path/to/config job_name The configuration specifies the document paths and the target Elasticsearch index.
4.2 Vectorization
Python code uses the sentence‑embedding model to encode each document and store the vector:
from text2vec import SentenceModel
model = SentenceModel('shibing624/text2vec-base-chinese')
for doc in documents:
vector = model.encode(doc['text'])
es.index(index='knowledge_base', body={'text': doc['text'], 'vector': vector})4.3 Retrieval & Search
User queries are vectorized and combined with keyword matching for similarity search in Elasticsearch.
4.4 Answer Generation
The retrieved documents are concatenated into a prompt and fed to the LLM via Ollama:
from ollama import Client
ollama = Client()
context = "
".join([doc['_source']['text'] for doc in results['hits']['hits']])
prompt = f"根据以下内容回答问题:
{context}
问题:{query}"
response = ollama.generate(model='qwen2.5:14b', prompt=prompt)
answer = response['text']4.5 Gradio Interface
A minimal Gradio app wraps the pipeline:
import gradio as gr
def qa_system(query):
# query processing, retrieval, generation logic
return answer
interface = gr.Interface(fn=qa_system, inputs="text", outputs="text")
interface.launch()5. Future Improvements
Model Optimization : fine‑tune Qwen2.5‑14B on domain‑specific data or compare with a DeepSeek model.
Vectorization Enhancements : experiment with alternative Chinese embedding models or further fine‑tune text2vec for better semantic accuracy.
Document Granularity : split the source book into smaller sections (e.g., per chapter or subsection) before indexing to potentially improve retrieval precision.
Conclusion
The Qwen2.5‑14B + Elasticsearch RAG pipeline demonstrates how retrieval‑augmented generation can provide fast, accurate answers from large private document collections. The clear modular design—from data ingestion to UI—and the demonstrated performance highlight the strong potential of RAG in knowledge‑management scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
