Databases 16 min read

BES Engineering Practices for Large‑Scale Vector Database Scenarios

At QCon 2023, Baidu’s BES team detailed how their cloud‑native Elasticsearch service has been engineered for large‑scale vector search, describing architecture, C++ plugin integration, memory‑saving storage tricks, HNSW/IVF optimizations, filter strategies, and real‑world multimodal video and LLM knowledge‑base deployments.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
BES Engineering Practices for Large‑Scale Vector Database Scenarios

This article, originally presented at the QCon Global Software Development Conference 2023 (Beijing), introduces vector databases and shares the engineering practice of Baidu Intelligent Cloud's BES (Baidu ElasticSearch) in large‑scale vector database scenarios.

Vector databases store and query high‑dimensional vectors generated by embedding techniques from images, audio, text, etc. By measuring distances between vectors, they enable similarity search such as image‑by‑image retrieval.

The talk first outlines the rapid development of vector retrieval before large language models (LLMs) and its wide adoption in multimodal search, recommendation, semantic retrieval, QA, and face recognition. It then discusses the limitations of LLMs—knowledge gaps, hallucinations, high training cost, and privacy concerns—and proposes prompt‑engineering (retrieval‑augmented generation) as a solution.

Next, the speaker presents the architecture of Baidu ElasticSearch (BES). BES consists of a control plane for cluster management, scaling, and hot‑cold data scheduling, and a data plane with Elasticsearch clusters deployed on cloud VMs and disks, fronted by a four‑layer load balancer (BLB). Data can be off‑loaded to Baidu Object Storage (BOS) to reduce storage costs.

The vector engine is implemented as a self‑developed C++ plugin accessed via JNI. The engine builds on open‑source vector libraries (e.g., nmslib) and supports HNSW, IVF, and other ANN algorithms. HNSW offers high recall but incurs high memory and construction cost; BES mitigates this by asynchronous background index building, optimized segment merging, and bitmap‑based filtering to combine scalar filters with vector search.

Key optimizations include:

Using mmap‑based columnar storage for level‑0 vectors to reduce memory footprint.

Replacing the default multi‑round segment merge with a single‑round merge for vector indexes.

Supporting pre‑filter and post‑filter strategies, and falling back to brute‑force search when filter selectivity exceeds 90%.

Accelerating brute‑force search with SIMD instructions.

The roadmap focuses on improving usability (e.g., SQL‑based k‑NN), expanding supported index and distance algorithms (e.g., DiskAnn, Puck&Tinker), reducing overhead of the JVM‑C++ bridge, and offering more elastic resource plans.

Case studies demonstrate BES in multimodal video retrieval, where video frames are embedded and stored for tasks like tagging, short‑video generation, and recommendation, as well as in Baidu’s Qianfan large‑model platform, which uses BES as a knowledge‑base backend to provide private, secure LLM‑driven Q&A.

Overall, the presentation showcases how a cloud‑native Elasticsearch service can be extended to meet the demanding performance, scalability, and privacy requirements of modern AI‑driven vector search workloads.

AIElasticsearchvector databasecloudLarge ScaleSearchBES
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.