May 20, 2026 · Artificial Intelligence

Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval

The article argues that the common assumption that text chunks are the proper knowledge unit in RAG pipelines is flawed, leading to versioning, metadata, and redundancy problems, and demonstrates that replacing chunks with structured IdeaBlocks dramatically reduces corpus size, token usage, and improves vector relevance.

IdeaBlockLLMMetadata

0 likes · 10 min read

Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval

Sohu Tech Products

Nov 1, 2023 · Databases

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

Douyin tackled vector‑retrieval challenges by optimizing HNSW and creating a high‑performance IVF algorithm, implementing custom scalar quantization, SIMD acceleration, and a DSL‑driven engine that merges filtering with search, then built a cloud‑native, storage‑compute‑separated vector database (VikingDB) delivering sub‑10 ms latency, real‑time updates, multi‑tenant support, and secure, scalable retrieval for LLM‑driven applications.

ANNLLM integrationStorage Compute Separation

0 likes · 18 min read

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions

retrieval optimization

Why Chunk‑Based RAG Fails and How IdeaBlocks Improve Retrieval

Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions