Artificial Intelligence 5 min read

Kuaishou and Tsinghua University Win NeurIPS'21 Billion-Scale ANN Challenge with FAISS‑Optimized KST_ANN Solution

On December 6, Kuaishou and Tsinghua University’s joint team secured first place in the NeurIPS'21 Billion‑Scale Approximate Nearest Neighbor Search Challenge by leveraging a FAISS‑optimized, memory‑efficient KST_ANN algorithm that achieved over 6% higher recall on multiple billion‑scale datasets, showcasing the practical impact of large‑scale vector retrieval in AI‑driven services.

Kuaishou Tech

Dec 10, 2021

Kuaishou and Tsinghua University Win NeurIPS'21 Billion-Scale ANN Challenge with FAISS‑Optimized KST_ANN Solution

On December 6, the highly anticipated NeurIPS'21 vector retrieval competition concluded, and the joint Kuaishou‑Tsinghua University team won the standard‑memory track with a FAISS‑optimized algorithm.

In the era of big data, neural networks encode massive unstructured data such as speech, images, and video into vectors; vector retrieval enables efficient similarity search across large vector collections, forming the foundation for multimodal content understanding. Kuaishou, with 320 million daily active users and AI compute nearing 10 EFLOPS, applies vector retrieval across numerous scenarios including video duplicate detection, content recommendation, face detection, and product search.

Historically, academic evaluations of vector retrieval algorithms have focused on relatively small datasets of around one million points, but real‑world deployment now confronts challenges at the hundred‑million to billion‑scale.

The NeurIPS'21 Billion‑Scale Approximate Nearest Neighbor Search Challenge, organized by Neural Information Processing Systems, invited novel research for large‑scale ANN recall. Leading experts from Microsoft Research, Facebook AI, Carnegie Mellon University, Yandex, and others contributed four new ten‑billion‑scale datasets, raising the competition’s difficulty.

Kuaishou‑Tsinghua’s self‑developed KST_ANN, a standard‑memory approximate search solution, combined hardware‑software co‑optimization, data quantization, and error correction techniques. It delivered balanced, efficient performance on four datasets (bigann‑1B, deep‑1B, msspacev‑1B, msturing‑1B), improving average benchmark recall by over 6% and achieving up to an 8% gain on bigann‑1B. In the disk‑based track, their submissions also surpassed baseline recall across multiple datasets.

As big‑data technologies evolve, large‑scale vector retrieval will increasingly appear in production‑grade, high‑availability data centers, supporting enterprise‑critical and web‑scale search applications where cost, preprocessing time, and power consumption become as crucial as recall and latency. The techniques demonstrated by Kuaishou and Tsinghua provide new directions for boosting recall on massive datasets and lay essential groundwork for future AI‑driven, multimodal interactive experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI FAISS Vector Retrieval NeurIPS ANN KST_ANN Large‑Scale Search

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.