How BES Powers Large-Scale Vector Search for AI Applications
This article explains the principles of vector databases, outlines the engineering practices of Baidu Intelligent Cloud BES for large‑scale vector retrieval, discusses optimization techniques such as HNSW, IVF and filter integration, and presents real‑world AI use cases and future development directions.
1. Introduction to Vector Databases
Vector databases store and query high‑dimensional vector representations of images, audio, text and other data extracted via embedding techniques. Similarity between vectors reflects feature similarity of the original data, enabling applications such as image‑by‑image search.
2. BES Engineering Practice
Elasticsearch, built on Apache Lucene, is a distributed search and analytics engine. Baidu Intelligent Cloud Elasticsearch (BES) extends the open‑source project with cloud‑native features, NLP plugins, snapshot support, hot‑cold storage, and vector search capabilities optimized for large‑model scenarios.
The BES architecture consists of a control plane for cluster management and a BES cluster instance that runs on cloud VMs with load‑balancing via BLB. Data can be tiered to Baidu Object Storage (BOS) to reduce storage costs.
Indexing follows a Shared‑Nothing + MPP model, reusing Elasticsearch’s data flow. Vector data is managed like scalar data, allowing bulk ingestion and parallel query execution across shards, improving QPS by adding replicas and scaling nodes.
Vector indexing is implemented via a custom C++ plugin accessed through JNI, enabling low‑level SIMD optimizations and flexible execution plans. BES evaluates both nmslib and Faiss, choosing HNSW for high recall despite higher memory usage.
To mitigate HNSW’s slow graph construction, BES builds indexes asynchronously, decoupling write latency from index building, and optimizes segment merging by performing a single‑pass merge for vector shards.
IVF (inverted file) indexing is also supported: vectors are clustered with k‑means, cluster centroids become “keywords,” and a two‑stage search first retrieves relevant centroids then performs fine‑grained vector search.
Filter integration is achieved by generating a bitmap of IDs that satisfy scalar filters, passing it to the vector engine, and pruning the HNSW graph during traversal. When filter selectivity exceeds 90 %, BES falls back to brute‑force search accelerated by SIMD.
3. Case Studies
3.1 Multimodal Video Retrieval – Video frames are embedded into vectors, stored in BES, and queried to support tagging, short‑to‑long video matching, and personalized recommendation.
3.2 Qianfan Large‑Model Platform – BES powers a knowledge‑base service for large language models, offering secure, private knowledge retrieval and supporting both standalone deployment and plugin integration on the Qianfan platform.
4. Future Directions
Improving usability by offering SQL‑based k‑NN queries, expanding supported index and distance algorithms (e.g., DiskANN, Puck & Tinker), enhancing heterogeneous computing support, reducing JVM‑C++ overhead, and providing more elastic resource provisioning to lower cost and operational barriers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baidu Intelligent Cloud Tech Hub
We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
