Databases 14 min read

How Vector Search Powers LLMs: Inside ByteHouse’s High‑Performance Vector Database

With the rise of LLMs, vector search and vector databases have become essential for extending model memory, and this article explains the principles, algorithms, design choices, implementation details, and performance results of ByteHouse’s cloud‑native vector retrieval engine.

Volcano Engine Developer Services

Apr 24, 2024

How Vector Search Powers LLMs: Inside ByteHouse’s High‑Performance Vector Database

Vector Retrieval Status Analysis

As LLM technology matures, databases need to enhance vector analysis and AI support, making vector databases and retrieval capabilities a hot topic that provides LLMs with external memory to improve answer accuracy.

Vector Retrieval Definition

For unstructured data such as images, video, or audio, traditional databases cannot process them directly. The common approach is to convert these items into vector embeddings via models, store the vectors, and during search convert the query into a vector and perform similarity matching.

Technically, vector retrieval performs a K‑Nearest Neighbors (KNN) search to find the k most similar vectors among N D‑dimensional vectors.

Because exact KNN is costly at large scale, many systems adopt Approximate Nearest Neighbor (ANN) search, sacrificing some accuracy for faster response.

LLM and Vector Retrieval

LLMs trained on limited data may return inaccurate answers for recent or domain‑specific queries. By embedding relevant documents into a vector database and converting the question into a vector, similarity search yields related knowledge that can be used as a prompt, improving the final answer.

Four Vector Retrieval Algorithms

Vector retrieval algorithms can be grouped into four categories based on storage structure:

Table‑based, e.g., LSH.

Tree‑based, which organizes vectors into a similarity tree.

Cluster‑based (IVF): vectors are clustered first; queries search nearest clusters then nearest vectors, often combined with quantization (SQ, PQ).

Graph‑based, e.g., HNSW, which builds a graph of vectors for fast traversal; offers high query speed and concurrency but higher memory and build cost.

In practice, Cluster‑based and Graph‑based methods are most widely used.

How to Build a Vector Database

A vector database must support storage and management of vector data and indexes, offering CRUD operations and high‑performance retrieval, often combined with attribute filtering and multimodal queries.

Two design approaches exist:

Build a dedicated vector‑oriented database from the ground up, optimizing for retrieval but lacking complex data management, requiring integration with other databases.

Extend an existing database with vector‑search extensions, providing all‑in‑one management but limited by the underlying architecture’s performance.

Current Progress of Vector Databases

Vector databases are rapidly evolving, with trends toward dedicated vector engines that add complex data types, storage‑compute separation, consistency, real‑time ingestion, and advanced filtering capabilities.

Another trend is enhancing traditional databases with vector extensions, adding more algorithms and specialized filtering strategies.

ByteHouse Vector Retrieval

ByteHouse, a cloud‑native data warehouse built on ClickHouse, adds vector, full‑text, and geospatial search. It leverages a complete SQL syntax, high‑performance engine, and rich data management to support diverse scenarios.

Its vector retrieval design includes a dedicated execution path, multiple algorithms, and integration with scalar queries.

Main Design Ideas

Short‑circuit vector query processing and introduce specialized operators to reduce compute and I/O.

Add a dedicated Vector Index management module (library, executor, cache, metadata).

Extend the storage layer to persist a Vector Index per data part.

Challenges and Optimizations

Read Amplification : The smallest read unit is a mark; even with vector index results, the system still reads full marks. Optimization moved vector computation before data‑part reads, reducing I/O dramatically and doubling performance.

Resource‑Intensive Index Build : Vector index construction is slower and more memory‑hungry than simple indexes. Optimizations limited concurrent build threads, introduced disk‑based computation for IVF, and added memory buffers.

Cold‑Read Overhead : Loading index structures into memory incurs latency. Added cache‑preload support to load new indexes at startup, during writes, and after merges, with automatic GC.

Performance Evaluation

Using VectorDBBench on the cohere‑1M dataset, ByteHouse achieves recall 98 with QPS > 2600 and p99 latency ≈ 15 ms, comparable to dedicated vector databases.

QPS scales with concurrency, surpassing Milvus with HNSW.

Recall 98 is the baseline for meaningful QPS comparison.

Load duration (data ingest + index build) is shorter than Milvus.

Serial latency p99 for 10k queries is slightly higher due to I/O and parsing overhead.

Future Plans

Develop lighter, higher‑performance index structures (on‑disk indices, better compression).

Deepen integration of vector search with scalar filters, UDF‑based embeddings, and full‑text search.

Combine vector optimization with the query optimizer for point‑lookup scenarios.

Improve usability and ecosystem support, e.g., tighter integration with LangChain and other LLM frameworks.

vector database vector search LLM integration kNN ANN ByteHouse

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.