Artificial Intelligence 12 min read

Video Deduplication on Xianyu Using High‑Dimensional Vector Retrieval

The Xianyu platform combats video plagiarism by extracting key frames, converting them into 1024‑dimensional vectors, and using product quantization‑based high‑dimensional vector retrieval to achieve over 95% recall with ~100 ms latency and more than 1000 QPS, enabling scalable video, image, and product deduplication.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Video Deduplication on Xianyu Using High‑Dimensional Vector Retrieval

Background: Xianyu platform faces video plagiarism; the solution is to convert videos into vectors and use vector similarity for deduplication.

Challenges: billions of video frames, 1024‑dimensional per‑frame vectors, need >95% recall, latency ~100 ms, QPS >1000.

Implementation includes:

Video vectorization: extract key frames, compute local and global features via custom operators on TensorFlow Lite.

Similarity metrics: Hamming distance, cosine similarity, Euclidean distance, inner product.

Vector retrieval methods: tree‑based (KD‑tree), hashing (LSH), and vector quantization (PQ, hierarchical clustering). PQ was selected for large‑scale performance.

System architecture: client performs on‑device feature extraction; backend provides a unified vector access layer, log synchronization, offline data center, and a vector search engine (Alibaba BE integrated with FAISS).

Results: after deployment, the system handles >1000 QPS, latency ~100 ms per frame, and achieves >95% recall.

Conclusion: the approach demonstrates effective large‑scale video deduplication and can be extended to image and product deduplication.

faissVector Retrievalproduct recommendationhigh-dimensional vectorsPQvideo deduplication
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.