Databases 11 min read

Building a CPU‑Only Poetry Retrieval Engine with Qwen Embeddings and Redis Vector Search

This article details a lightweight, CPU‑only knowledge‑base retrieval experiment that uses Qwen3‑Embedding‑0.6B to vectorize Chinese poetry, stores vectors in Redis with HNSW indexing, and implements a hybrid keyword‑plus‑vector search pipeline with configurable weighting and performance optimizations.

Tech Musings
Tech Musings
Tech Musings
Building a CPU‑Only Poetry Retrieval Engine with Qwen Embeddings and Redis Vector Search

Project Background and Scope

This is a personal, small‑but‑complete knowledge‑base retrieval experiment that runs on two 8‑CPU‑16‑GB Linux hosts without GPU. The goal is to build a usable retrieval stack (embedding → Redis storage → hybrid search → Top‑K ranking) before adding a generative RAG component.

Resource prerequisites: two 8C‑16G Linux machines, no GPU.

Goal trade‑offs: prioritize the retrieval stack (embedding, Redis storage, hybrid search, Top‑K ranking).

Model selection: Qwen/Qwen3-Embedding-0.6B (lightweight, CPU‑friendly).

Corpus selection: public Tang poetry data.

Current focus is on retrieval and engineering; a full LLM conversational system is not included. The service returns interactive Top‑K poem lines/paragraphs.

Overall Pipeline

The system consists of two main chains: offline construction and online query.

Key Modules and Responsibilities

Embedding (Vectorization)

Model: Qwen/Qwen3-Embedding-0.6B Vector dimension: 1024

Query prefix: query: (used only for query concatenation)

CPU optimization: num_threads=8, batch size 2 to control peak memory.

Implemented as a standalone module that loads the model, performs batch inference, and outputs float32 vectors.

Corpus Cleaning and Chunking Strategy (Poetry‑Specific)

No chunking: Tang poems are short (well below max_length=512), so each whole poem is used as the minimal index unit.

Optional sentence splitting: a split_sentences switch is kept for future higher‑density recall.

Cleaning principle: lightweight normalization (remove extra whitespace/newlines) to avoid duplicate entries and reduce Redis deduplication cost.

Redis Vector Store and Full‑Text Search (Keyword + Vector Recall)

Relies on FT.SEARCH and vector KNN with WITHSCORES. Built‑in in Redis 8.4.0; older versions need the RediSearch module.

Index uses HNSW; parameters such as m and ef_construction can be tuned in the config file.

Data structure: a Redis Hash per document, key prefixed with doc:, containing fields: text – TEXT field for full‑text search and keyword filtering. vector – VECTOR field storing the float32 embedding bytes for KNN. metadata – TEXT field storing a JSON string (source, author, title, dynasty, etc.).

Embedding is performed on the application side; Redis handles indexing, recall, and hybrid search via FT.SEARCH (keyword) and KNN (vector). Future work may move embedding/re‑ranking to a separate service.

Client library: redis-py can call FT.SEARCH directly, but the project uses redisvl (Redis Vector Library) for convenient indexing, writing, and vector queries.

Implemented as a "vector storage and retrieval" module responsible for index creation, data ingestion, KNN/hybrid search, and re‑ranking.

Service Orchestration (Knowledge‑Base Construction)

Initialize the embedder (load model, set threads, configure image).

Load the corpus (plain‑text files).

Clean and flatten the corpus (default split_sentences=False, i.e., whole poem per document) with lightweight cleaning for deduplication.

Connect to Redis and create the index (Hash with TEXT and VECTOR fields).

Deduplicate: batch‑check if a text already exists to avoid redundant embedding/writing.

Vectorize new texts and "vectorize‑while‑writing" to Redis to reduce memory usage and total latency.

Enter interactive query mode.

Retrieval Strategy: Hybrid Recall + Re‑ranking

Redis query pattern:

If a text query is present: (@text:xxx)=>[KNN N @vector $vec AS vector_score] WITHSCORES If no text query or no hits: fallback to *=>[KNN K ...] Text score: the WITHSCORES value from Redis full‑text relevance (typically BM25).

Vector score: cosine distance (smaller distance = more similar) converted to similarity via max(0, 1 - dist) to keep scores non‑negative.

Cosine distance ∈ [0, 2]; similarity = 1 - distance may be negative, so max(0, 1 - dist) ensures non‑negative scores.

Re‑ranking normalizes both scores and applies default weights w_v=0.7 (vector) and w_t=0.3 (text).

Performance and Stability Tips Under Resource Constraints

CPU Inference and Memory Control

Small batch size: batch_size=2 for stable CPU inference on 8C‑16G machines.

Limit token length: max_length=512 (poetry is short, saves memory).

Use float32 vectors to reduce Redis storage size and network overhead.

Optional Intel acceleration: intel-extension-for-pytorch (e.g., 2.8.0) can speed up Transformer inference on CPU‑only setups.

Redis‑Side Optimizations

Connection pool: max_connections=10 to reduce connection overhead.

HNSW parameters ( m, ef_construction) affect recall, speed, and memory; tune according to data size.

Expand hybrid candidate set: retrieve top_k * 3 candidates before re‑ranking to improve final Top‑K quality.

Engineering Details (Reducing Implicit Costs)

Deduplication: use MD5/SHA256 of the text as part of the Redis key (e.g., doc:{hash}) for efficient duplicate detection.

Query escaping: clean special characters -(){}[]^"~*?:\ to avoid parsing errors.

Fallback strategy: if keyword filtering yields no matches, fall back to pure vector KNN to guarantee a result.

End‑to‑End Knowledge Query Flow (One‑Step Version)

User inputs a query (keyword or short phrase).

The retrieval service sanitizes the query (normalization, illegal‑character removal).

The embedding module prefixes the query with query: and generates an embedding.

Redis search module performs a hybrid search (KNN + TEXT + WITHSCORES).

Candidate results are normalized, weighted, and re‑ranked.

Top‑K results are returned to the user (currently via CLI; can later be fed to a RAG model).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonVector SearchEmbeddingCPUknowledge base
Tech Musings
Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.