Building a CPU‑Only Poetry Retrieval Engine with Qwen Embeddings and Redis Vector Search
This article details a lightweight, CPU‑only knowledge‑base retrieval experiment that uses Qwen3‑Embedding‑0.6B to vectorize Chinese poetry, stores vectors in Redis with HNSW indexing, and implements a hybrid keyword‑plus‑vector search pipeline with configurable weighting and performance optimizations.
Project Background and Scope
This is a personal, small‑but‑complete knowledge‑base retrieval experiment that runs on two 8‑CPU‑16‑GB Linux hosts without GPU. The goal is to build a usable retrieval stack (embedding → Redis storage → hybrid search → Top‑K ranking) before adding a generative RAG component.
Resource prerequisites: two 8C‑16G Linux machines, no GPU.
Goal trade‑offs: prioritize the retrieval stack (embedding, Redis storage, hybrid search, Top‑K ranking).
Model selection: Qwen/Qwen3-Embedding-0.6B (lightweight, CPU‑friendly).
Corpus selection: public Tang poetry data.
Current focus is on retrieval and engineering; a full LLM conversational system is not included. The service returns interactive Top‑K poem lines/paragraphs.
Overall Pipeline
The system consists of two main chains: offline construction and online query.
Key Modules and Responsibilities
Embedding (Vectorization)
Model: Qwen/Qwen3-Embedding-0.6B Vector dimension: 1024
Query prefix: query: (used only for query concatenation)
CPU optimization: num_threads=8, batch size 2 to control peak memory.
Implemented as a standalone module that loads the model, performs batch inference, and outputs float32 vectors.
Corpus Cleaning and Chunking Strategy (Poetry‑Specific)
No chunking: Tang poems are short (well below max_length=512), so each whole poem is used as the minimal index unit.
Optional sentence splitting: a split_sentences switch is kept for future higher‑density recall.
Cleaning principle: lightweight normalization (remove extra whitespace/newlines) to avoid duplicate entries and reduce Redis deduplication cost.
Redis Vector Store and Full‑Text Search (Keyword + Vector Recall)
Relies on FT.SEARCH and vector KNN with WITHSCORES. Built‑in in Redis 8.4.0; older versions need the RediSearch module.
Index uses HNSW; parameters such as m and ef_construction can be tuned in the config file.
Data structure: a Redis Hash per document, key prefixed with doc:, containing fields: text – TEXT field for full‑text search and keyword filtering. vector – VECTOR field storing the float32 embedding bytes for KNN. metadata – TEXT field storing a JSON string (source, author, title, dynasty, etc.).
Embedding is performed on the application side; Redis handles indexing, recall, and hybrid search via FT.SEARCH (keyword) and KNN (vector). Future work may move embedding/re‑ranking to a separate service.
Client library: redis-py can call FT.SEARCH directly, but the project uses redisvl (Redis Vector Library) for convenient indexing, writing, and vector queries.
Implemented as a "vector storage and retrieval" module responsible for index creation, data ingestion, KNN/hybrid search, and re‑ranking.
Service Orchestration (Knowledge‑Base Construction)
Initialize the embedder (load model, set threads, configure image).
Load the corpus (plain‑text files).
Clean and flatten the corpus (default split_sentences=False, i.e., whole poem per document) with lightweight cleaning for deduplication.
Connect to Redis and create the index (Hash with TEXT and VECTOR fields).
Deduplicate: batch‑check if a text already exists to avoid redundant embedding/writing.
Vectorize new texts and "vectorize‑while‑writing" to Redis to reduce memory usage and total latency.
Enter interactive query mode.
Retrieval Strategy: Hybrid Recall + Re‑ranking
Redis query pattern:
If a text query is present: (@text:xxx)=>[KNN N @vector $vec AS vector_score] WITHSCORES If no text query or no hits: fallback to *=>[KNN K ...] Text score: the WITHSCORES value from Redis full‑text relevance (typically BM25).
Vector score: cosine distance (smaller distance = more similar) converted to similarity via max(0, 1 - dist) to keep scores non‑negative.
Cosine distance ∈ [0, 2]; similarity = 1 - distance may be negative, so max(0, 1 - dist) ensures non‑negative scores.
Re‑ranking normalizes both scores and applies default weights w_v=0.7 (vector) and w_t=0.3 (text).
Performance and Stability Tips Under Resource Constraints
CPU Inference and Memory Control
Small batch size: batch_size=2 for stable CPU inference on 8C‑16G machines.
Limit token length: max_length=512 (poetry is short, saves memory).
Use float32 vectors to reduce Redis storage size and network overhead.
Optional Intel acceleration: intel-extension-for-pytorch (e.g., 2.8.0) can speed up Transformer inference on CPU‑only setups.
Redis‑Side Optimizations
Connection pool: max_connections=10 to reduce connection overhead.
HNSW parameters ( m, ef_construction) affect recall, speed, and memory; tune according to data size.
Expand hybrid candidate set: retrieve top_k * 3 candidates before re‑ranking to improve final Top‑K quality.
Engineering Details (Reducing Implicit Costs)
Deduplication: use MD5/SHA256 of the text as part of the Redis key (e.g., doc:{hash}) for efficient duplicate detection.
Query escaping: clean special characters -(){}[]^"~*?:\ to avoid parsing errors.
Fallback strategy: if keyword filtering yields no matches, fall back to pure vector KNN to guarantee a result.
End‑to‑End Knowledge Query Flow (One‑Step Version)
User inputs a query (keyword or short phrase).
The retrieval service sanitizes the query (normalization, illegal‑character removal).
The embedding module prefixes the query with query: and generates an embedding.
Redis search module performs a hybrid search (KNN + TEXT + WITHSCORES).
Candidate results are normalized, weighted, and re‑ranked.
Top‑K results are returned to the user (currently via CLI; can later be fed to a RAG model).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
