Distributed High‑Performance Vector Retrieval with gpdb‑faiss‑vector Plugin on Dolphin Engine
The gpdb‑faiss‑vector plugin embeds Facebook’s Faiss library into the Dolphin (Greenplum‑compatible) engine, exposing SQL functions for distributed, high‑performance approximate nearest‑neighbor vector retrieval with caching, parallel search, configurable indexes, and sub‑millisecond latency, enabling scalable recommendation and advertising workloads.
With the development of deep learning, all data can be represented as vectors, making vector recall a fundamental capability for search, recommendation, and advertising scenarios. The Alibaba Mama Intelligent Analysis Engine team built the gpdb‑faiss‑vector plugin for the Dolphin engine (compatible with Greenplum) to provide distributed, high‑performance vector recall.
The plugin encapsulates Facebook's Faiss library as a Greenplum extension, deploying it on each Dolphin node. It now serves multiple Alibaba Mama business cases, including audience recommendation, keyword recommendation, campaign effectiveness estimation, and ad‑tool recommendation.
Vector recall solves K‑Nearest Neighbor (KNN) or Radius Nearest Neighbor (RNN) problems. In practice the problem is approximated as ANN (Approximate Nearest Neighbor). ANN solutions fall into three categories: space‑partitioning (e.g., KD‑Tree), space‑encoding (e.g., LSH), and neighbor‑graph (e.g., HNSW).
Industry solutions include open‑source libraries such as Faiss and SPTAG, distributed engines like Milvus, Vearch, and SimSvr, and database plugins such as pgvector and PASE. Libraries offer rich algorithms but lack storage and management features; engines provide full‑stack services but require separate deployment and non‑SQL APIs; database plugins give low‑cost SQL access but often miss distributed capabilities.
The gpdb‑faiss‑vector plugin brings Faiss into the database layer, exposing a complete set of SQL functions for index creation, training, adding vectors, parameter tuning, searching, and optional top‑K merging. This enables fully SQL‑driven workflows, distributed parallel retrieval, and offline batch processing.
Architecture: the Master node receives a vector‑recall SQL statement and dispatches it to Segment nodes. Each Segment checks a local Faiss cache; if absent, it loads the index bytes from storage, deserializes them, and caches the in‑memory object. The Segment then performs the search, returns local Top‑K results, and the Master merges them into a global Top‑K list.
Key advantages are the SQL‑based interface, high performance through distributed parallelism, rich index support, and flexible configuration via Faiss’s index_factory syntax. Runtime parameters can be tuned without rebuilding indexes, and a cache of Faiss objects reduces deserialization latency, bringing interactive query latency from >100 ms down to ~2 ms.
Additional features include atomic updates via database transactions, multi‑index management, combined tag‑and‑vector queries, batch‑query acceleration using multithreading and SIMD, and array‑based output to minimize row count. A dedicated topk_merge function handles global Top‑K aggregation in distributed queries.
Future work plans to separate the vector‑recall service from Dolphin, keeping a single shared Faiss service per cluster node to lower memory pressure and eliminate warm‑up delays, while continuing to optimize performance and expand use cases.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.