Engineering Practices of Douyin's Vector Database: From Retrieval Challenges to Cloud‑Native Solutions
Douyin tackled vector‑retrieval challenges by optimizing HNSW and creating a high‑performance IVF algorithm, implementing custom scalar quantization, SIMD acceleration, and a DSL‑driven engine that merges filtering with search, then built a cloud‑native, storage‑compute‑separated vector database (VikingDB) delivering sub‑10 ms latency, real‑time updates, multi‑tenant support, and secure, scalable retrieval for LLM‑driven applications.
With the widespread adoption of deep learning, the industry consensus is that everything can be represented by embeddings, which creates a strong demand for vector retrieval. Unlike traditional structured data retrieval, vector retrieval faces distinct challenges.
The article introduces Douyin's step‑by‑step engineering experience in building a vector database, covering three main parts: the background of vector databases, their technical evolution, and future application prospects.
1. Background of Vector Databases
Unstructured data such as text, images, and videos dominate Douyin's data volume (well over 80%). Traditional text retrieval relies on inverted indexes with BM25 or TF‑IDF, which suffer from limited semantic capability, difficulty extending to multimodal scenarios, and performance degradation as data grows.
Deep learning models (doc2vec, BERT, LLMs) enable converting unstructured data into vectors, turning the retrieval problem into approximate nearest‑neighbor (ANN) search.
2. Core Concepts of Vector Retrieval
Vector retrieval involves measuring similarity (Euclidean distance, inner product, cosine), selecting a top‑K result set, and balancing precision with efficiency. Common solutions include:
ANN algorithms (e.g., HNSW, IVF) that use auxiliary structures for pruning.
Quantization techniques (e.g., Product Quantization, scalar quantization) to reduce computation cost.
Implementation optimizations such as SIMD instructions and cache‑friendly memory layout.
Douyin's practice:
Optimized the open‑source HNSW algorithm and independently developed an IVF algorithm, achieving higher performance without sacrificing accuracy.
Created a custom scalar quantization supporting int16, int8, and int4, enabling retrieval of 200 million candidates on a single T4 GPU.
Applied SIMD‑based acceleration and memory‑layout tuning.
3. From Retrieval Algorithms to a Vector Database
Integrating storage and retrieval functions yields a vector database that must provide high availability, performance, and ease of use. The system supports storage, search, and analytics as an online service.
4. Technical Evolution
Douyin observed that vector data often needs to be combined with structured attributes for permission filtering. Two filtering strategies are used:
Post‑filtering: retrieve a larger candidate set, then apply structured filters.
Pre‑filtering: apply DSL (domain‑specific language) filters before vector search.
To address performance issues, Douyin built a DSL‑directed engine that performs vector search and DSL filtering simultaneously, offering high performance, logical completeness, early termination, and execution‑plan optimization.
Additional evolutions include:
Separation of storage and compute: vector storage cluster, batch index‑building cluster, and online search service.
Benefits: reduced index‑building resources, faster indexing (full CPU utilization), improved online stability, and easier auto‑tuning.
Streaming updates: a two‑stage index‑building pipeline that merges batch and streaming events, and a double‑buffer design for lock‑free online queries, achieving sub‑second update latency.
Cloud‑native migration: multi‑tenant orchestration, slot‑based scheduling, automated resource dispatch, and fine‑grained cost accounting.
5. VikingDB on Volcano Engine
The cloud‑native vector database VikingDB is launched on Volcano Engine, mirroring Douyin's internal system. It offers:
Extreme performance (sub‑10 ms latency for billions of vectors) with proprietary indexing algorithms.
Real‑time ingestion, updates, and automatic indexing.
High stability via storage‑compute separation and multi‑tenant support.
Broad applicability across over 20 internal services (e.g., Feishu Q&A, e‑commerce search).
Clients can use multi‑language SDKs or HTTP APIs to write unstructured data, which is automatically transformed into vectors, indexed, and served with end‑to‑end monitoring and auto‑tuning.
6. Application Outlook
Vector databases can complement large language models (LLMs) by providing:
Long‑term memory for multi‑turn interactions.
Domain‑specific knowledge injection.
Timely information retrieval for up‑to‑date answers.
Security considerations include protecting user queries from leakage and preventing cross‑user data contamination. Vector databases enable fine‑grained permission control, ensuring that users only retrieve knowledge they are authorized to access.
Overall, the article argues that vector databases will become foundational infrastructure for the LLM ecosystem, supporting scalable, secure, and real‑time retrieval across diverse business scenarios.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.