Why Using MySQL for RAG Leads to a Brutal Search Pitfall—and How Vector DB + ANN Saves You
The article explains why RAG systems cannot rely on MySQL for embedding storage, shows the O(n) brute‑force search latency for hundreds of thousands of chunks, and demonstrates how vector databases with ANN indexes such as HNSW or IVFFLAT provide millisecond‑level response, high recall, and scalable storage.
During a recent interview, the author was asked how their RAG system performed vector retrieval. The answer—storing embeddings in MySQL and scanning the whole table—resulted in a five‑second pause because the knowledge base contained over 500,000 chunks, making the interview fail.
RAG (Retrieval‑Augmented Generation) requires semantic search: documents and queries are turned into high‑dimensional embeddings (typically 768–3072 dimensions) and the most similar vectors must be found quickly. Traditional relational databases (MySQL, PostgreSQL) can only perform exact matches with = or LIKE and cannot compute cosine similarity, inner product, or Euclidean distance efficiently.
Brute‑force search using SQL incurs O(n) complexity. For 1,000,000 vectors of 1,024 dimensions, a single query requires 1,000,000 × 1,024 multiplications and yields second‑level latency, which is unacceptable for real‑time Q&A.
Approximate Nearest Neighbor (ANN) search implemented by vector databases reduces distance calculations dramatically, bringing latency down to the millisecond range. Benchmarks show a speed‑up of 100–200× compared with brute force, while recall drops only from 100% to 95–99%.
The article then enumerates vector index algorithms:
Exact Nearest Neighbor (ENN) : KD‑Tree, VP‑Tree – 100% recall but suffers from the “curse of dimensionality” in high‑dimensional AI embeddings.
ANN algorithms (the mainstream choice):
Graph‑based (e.g., HNSW) – builds a multi‑layer small‑world graph; offers the fastest queries and highest recall but consumes a lot of memory and has a slow build time.
Quantization‑based (e.g., IVF_PQ) – clusters vectors and compresses them; memory‑efficient and fast to build, but incurs larger accuracy loss.
Hashing‑based (e.g., LSH) – uses hash buckets to limit search space.
In the author’s project, PostgreSQL with the pgvector extension and an HNSW index were chosen. HNSW delivers millisecond‑level response for million‑scale data with a good balance of speed, recall (>99% with proper tuning), and memory usage. The index construction follows the formula level = floor(-ln(random()) * mL), creating exponentially fewer nodes at higher layers. Search proceeds greedily from the top layer down to the exact nearest neighbors.
Key HNSW parameters:
m : maximum connections per node – larger values improve recall but increase memory and build time.
ef_construction : search breadth during index build – higher values improve index quality at the cost of slower construction.
ef_search : search breadth at query time – the most critical runtime knob that balances speed and recall.
Dynamic updates are supported, but deleted vectors remain as “dead nodes” and can degrade recall; periodic REINDEX or vacuuming is required.
When data grows to tens of millions or billions of vectors, HNSW’s memory consumption becomes a bottleneck. In such cases, switching to IVFFLAT (inverted file clustering) is recommended: it uses K‑Means clustering and inverted lists to narrow the search space, offering lower memory usage and faster build times (4–32× faster) at the expense of slightly higher latency and modest recall loss.
Hybrid search (vector + BM25) is also discussed: combining semantic similarity with keyword matching improves relevance in many production scenarios.
Vector database options are grouped into four categories:
Traditional DB extensions : PostgreSQL + pgvector, MongoDB Atlas Vector Search – same stack, ACID transactions, low learning curve.
Search‑engine evolution : Elasticsearch/OpenSearch – strong hybrid search, mature distributed architecture.
Native vector databases : Milvus, Weaviate, Qdrant – built for billions of vectors, specialized indexing, high performance.
Managed cloud services : Pinecone, Zilliz Cloud, Weaviate Cloud – fully hosted, auto‑scaling, but higher cost and data resides with a third party.
Why PostgreSQL + pgvector was chosen over MySQL?
PostgreSQL’s extensibility allows installing pgvector (ANN support) without altering the core.
MySQL 8.x lacks native vector types; MySQL 9.0 introduces a VECTOR type but only supports brute‑force calculations, no ANN indexes, and has far fewer production‑grade examples.
Using a single PostgreSQL instance simplifies operations, provides transaction consistency between structured and vector data, and supports metadata filtering (e.g., WHERE category='Java') alongside vector similarity.
Example of a pgvector cosine‑similarity query (must use the same operator class as the index):
-- pgvector cosine similarity search example
-- <=> is the cosine distance operator (0 = identical, 2 = opposite)
SELECT content,
1 - (embedding <=> $1) AS cosine_similarity
FROM vector_store
WHERE metadata->>'category' = 'Java'
ORDER BY embedding <=> $1 -- ascending distance = more similar
LIMIT 5;
-- ⚠️ The distance operator used at query time must match the operator class
-- defined when creating the HNSW index (e.g., vector_cosine_ops), otherwise
-- the query falls back to a full table scan.In summary, relying on MySQL for RAG vector retrieval leads to prohibitive latency and scalability issues. Switching to a vector‑aware database with ANN indexes—especially HNSW for sub‑million workloads or IVFFLAT for larger scales—delivers the performance, recall, and operational benefits required for production‑grade RAG systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
