Unlock AI-Powered Document Search with WeKnora: A Hands‑On Guide

WeKnora is an open‑source LLM‑driven framework that transforms complex, multi‑format documents into searchable semantic knowledge, offering features such as Agent mode, hybrid retrieval, secure private deployment, and an easy‑to‑use web UI, with step‑by‑step installation instructions and demo screenshots.

Architecture Digest
Architecture Digest
Architecture Digest
Unlock AI-Powered Document Search with WeKnora: A Hands‑On Guide

Introduction

WeKnora is an open‑source LLM‑based document understanding and semantic search framework designed for complex, multi‑format documents such as PDFs, Word files, and images. It combines multimodal segmentation, semantic indexing, intelligent perception and LLM generation to provide high‑quality question answering via a Retrieval‑Augmented Generation (RAG) pipeline.

Key Features

Agent mode: Supports ReACT Agent, can call built‑in knowledge‑base, MCP tools and web search, iteratively refining answers.

Precise understanding: Extracts structured content from PDF, Word, images and builds a unified semantic view.

Intelligent reasoning: Leverages LLM to grasp document context and user intent for accurate QA and multi‑turn dialogue.

Multiple knowledge‑base types: Supports FAQ and document KBs, with folder, URL import, tagging and online entry.

Flexible extension: Decoupled pipeline (parsing, embedding, retrieval, generation) enables easy integration and customization.

Hybrid retrieval: Combines keyword, vector and knowledge‑graph search, supporting cross‑KB retrieval.

Web search: Built‑in DuckDuckGo engine for extensible internet search.

MCP tool integration: Extends Agent capabilities with uvx, npx launch tools and various transport methods.

Dialogue strategy: Configurable Agent model, normal model, retrieval thresholds and prompts for precise multi‑turn control.

Simple to use: Intuitive web UI and standard API require zero technical barrier.

Secure and controllable: Supports on‑premise and private‑cloud deployment; data remains fully under user control.

Technical Architecture

Document processing layer: Parses and preprocesses multi‑format documents (PDF, Word, images).

Knowledge modeling layer: Vectorizes, chunks and builds knowledge graphs to create deep semantic representations.

Retrieval engine layer: Hybrid of keyword, vector and knowledge‑graph strategies for efficient, accurate recall.

Reasoning & generation layer: Uses LLM for deep understanding and answer generation, with optional Agent reasoning.

Interaction layer: Provides a web UI and standard REST API.

The design allows flexible combination of retrieval strategies, LLMs (supports Ollama, interchangeable Qwen, DeepSeek, etc.) and vector databases, while ensuring controllability for private deployments.

Quick Start

Environment requirements

Docker

Docker Compose

Git

Installation steps

1. Clone the repository:

# Clone the main repo
git clone https://github.com/Tencent/WeKnora.git
cd WeKnora

2. Configure environment variables:

# Copy example env file
cp .env.example .env
# Edit .env and fill in required values (see comments in .env.example)

3. Start the services (including Ollama): ./scripts/start_all.sh or make start-all 4. Stop the services:

./scripts/start_all.sh --stop
# or
make stop-all

Service access URLs

Web UI: http://localhost Backend API: http://localhost:8080 Jaeger tracing:

http://localhost:16686

Feature Demonstration

Web UI screenshots illustrate knowledge‑base management, dialogue settings and the Agent tool‑calling process.

Open‑source Repository

https://github.com/Tencent/WeKnora

AILLMopen-sourcesemantic searchDocument RetrievalWeKnora
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.