Puck: Baidu’s Open‑Source High‑Performance ANN Retrieval Engine
Puck, Baidu’s open‑source Approximate Nearest Neighbor engine built on the proprietary Puck and Tinker algorithms, delivers high recall, accuracy and throughput across tiny to trillion‑scale datasets, outperforms rivals in benchmarks—including first‑place BIGANN 2021—while offering a simple, extensible API, proven reliability in dozens of Baidu services, and an Apache 2.0 license encouraging community contributions.
Puck is Baidu’s self‑developed open‑source Approximate Nearest Neighbor (ANN) retrieval engine, named after the agile DOTA hero. It is designed to achieve high recall, high accuracy, and high throughput across small, medium, and large data sets.
ANN (Approximate Nearest Neighbor) searches aim to find the top‑K closest vectors in a massive vector space while balancing retrieval quality and computational cost. Since the breakthrough of AlexNet in 2012 and the introduction of Transformers in 2017, ANN has become a foundational technology for search, recommendation, and many other AI‑driven applications.
The Puck project comprises two Baidu‑invented algorithms (Puck & Tinker). Open‑sourced internally in 2019, it now powers dozens of Baidu product lines, handling trillion‑scale indexes and massive query volumes.
Benchmark tests on datasets ranging from ten‑million to one‑billion vectors demonstrate clear performance advantages over competing solutions. In the 2021 BIGANN competition, Puck secured first place in all four participating tracks.
Key advantages include:
Ease of use – a simple API with minimal required parameters, most of which have sensible defaults.
Extensibility – a fully self‑designed index structure that supports a variety of functional extensions and modular redesign.
High performance – consistently superior QPS and recall on benchmark datasets.
Reliability – proven stability in large‑scale production across more than thirty Baidu services.
Additional functional extensions provide real‑time lock‑free insertion, conditional query filtering during index traversal, distributed index construction via map‑reduce, and adaptive parameter tuning that works well out‑of‑the‑box.
Puck is released under the Apache 2.0 license, encouraging community collaboration and knowledge sharing. More details, benchmark results, and the source code are available at the GitHub repository (https://github.com/baidu/puck) and the BIGANN benchmark page.
The community is invited to join the QQ group for support, contribute to the project, and help shape the future of open‑source ANN retrieval.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.