Kuaishou and Tsinghua University Win NeurIPS'21 Billion-Scale ANN Challenge with FAISS‑Optimized KST_ANN Solution
On December 6, Kuaishou and Tsinghua University’s joint team secured first place in the NeurIPS'21 Billion‑Scale Approximate Nearest Neighbor Search Challenge by leveraging a FAISS‑optimized, memory‑efficient KST_ANN algorithm that achieved over 6% higher recall on multiple billion‑scale datasets, showcasing the practical impact of large‑scale vector retrieval in AI‑driven services.
On December 6, the highly anticipated NeurIPS'21 vector retrieval competition concluded, and the joint Kuaishou‑Tsinghua University team won the standard‑memory track with a FAISS‑optimized algorithm.
In the era of big data, neural networks encode massive unstructured data such as speech, images, and video into vectors; vector retrieval enables efficient similarity search across large vector collections, forming the foundation for multimodal content understanding. Kuaishou, with 320 million daily active users and AI compute nearing 10 EFLOPS, applies vector retrieval across numerous scenarios including video duplicate detection, content recommendation, face detection, and product search.
Historically, academic evaluations of vector retrieval algorithms have focused on relatively small datasets of around one million points, but real‑world deployment now confronts challenges at the hundred‑million to billion‑scale.
The NeurIPS'21 Billion‑Scale Approximate Nearest Neighbor Search Challenge, organized by Neural Information Processing Systems, invited novel research for large‑scale ANN recall. Leading experts from Microsoft Research, Facebook AI, Carnegie Mellon University, Yandex, and others contributed four new ten‑billion‑scale datasets, raising the competition’s difficulty.
Kuaishou‑Tsinghua’s self‑developed KST_ANN, a standard‑memory approximate search solution, combined hardware‑software co‑optimization, data quantization, and error correction techniques. It delivered balanced, efficient performance on four datasets (bigann‑1B, deep‑1B, msspacev‑1B, msturing‑1B), improving average benchmark recall by over 6% and achieving up to an 8% gain on bigann‑1B. In the disk‑based track, their submissions also surpassed baseline recall across multiple datasets.
As big‑data technologies evolve, large‑scale vector retrieval will increasingly appear in production‑grade, high‑availability data centers, supporting enterprise‑critical and web‑scale search applications where cost, preprocessing time, and power consumption become as crucial as recall and latency. The techniques demonstrated by Kuaishou and Tsinghua provide new directions for boosting recall on massive datasets and lay essential groundwork for future AI‑driven, multimodal interactive experiences.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.