Artificial Intelligence 17 min read

How UniDex and UniSearch Redefine Video Search with Semantic Indexing and Generative Models

This article explains how Kuaishou’s UniDex replaces traditional term‑based inverted indexes with model‑driven semantic posting lists and how the end‑to‑end UniSearch framework generates video IDs directly from queries, delivering higher relevance, lower latency, and significant online performance gains.

Kuaishou Tech

Nov 20, 2025

How UniDex and UniSearch Redefine Video Search with Semantic Indexing and Generative Models

Background

Modern video search must understand ambiguous user intent and retrieve the correct video from billions of items within strict latency constraints. Traditional term‑based inverted indexes struggle with colloquial queries and real‑time updates.

UniDex: Model‑based Semantic Inverted Index

UniTouch – FSQ Quantization

UniTouch replaces the sparse posting list with a dense semantic representation. Query and document encoders produce high‑dimensional vectors that are projected to a low‑dimensional space and quantized by Finite‑Scalar Quantization (FSQ) into 2‑dimensional semantic IDs (SID). The quantization uses a learnable linear projection followed by FSQ, and the gradient is refined with EWGS (Exact Weighted Gradient Scaling) to back‑propagate quantization error.

UniRank – Dual‑tower Token‑level Ranking

UniRank adopts a dual‑tower architecture similar to UniTouch but focuses on ranking. Both query and video towers output multiple 128‑dimensional dense vectors. Token‑level interaction is performed by concatenating learnable <CLS> tokens and computing fine‑grained similarity scores, which are aggregated to produce the final relevance score.

Training Techniques

Contrastive Learning : ListWise InfoNCE loss with dynamic temperature and hard negative sampling.

Token‑matching Loss : Encourages matching of individual token embeddings between query and video.

Quantization Regularization : Binary‑Quant regularization term stabilizes FSQ training and mitigates float‑precision loss after TensorRT acceleration.

EWGS Gradient Optimization : Improves stability of quantization‑aware training.

Results

On the RS dataset, UniDex improves Recall@300 by 14.18% and MRR@10 by 10.02% over a sparse baseline, while reducing storage and latency by ~25%. Online A/B tests show lower resource consumption and higher user engagement.

UniSearch: End‑to‑End Generative Search for Live Streaming

Architecture

The system consists of:

Search Generator : An encoder‑decoder model that takes the user query, context features, and a <CLS> token, then autoregressively predicts a sequence of semantic IDs representing the target video.

Video Encoder : Encodes video features into dense embeddings and maps them to the same SID space using a VQ‑VAE codebook, enabling direct comparison with generated IDs.

VQ‑VAE Codebook : Learned jointly with the encoder; each embedding is quantized to the nearest codebook entry, providing a discrete representation for videos.

Joint Training

Training is performed jointly on three objectives:

Residual Progressive Learning : A residual‑based contrastive loss that mimics the cascade of recall → coarse‑rank → fine‑rank, aligning query and video semantics at multiple granularity levels.

Token‑matching Loss : Aligns each generated token with the corresponding video token to preserve consistency between generation and retrieval.

Codebook Regularization : Combines VQ reconstruction loss with a linear projection (SimVQ) to prevent codebook collapse and ensure the quantized vectors stay close to their continuous counterparts.

Online Reinforcement

After offline training, an online reward system fine‑tunes the generator using two signals:

Precision‑ranking reward : Rewards generated IDs that receive high relevance scores from the downstream ranker.

Search Preference Optimization (SPO) : A reinforcement‑learning component that incorporates real user behavior (clicks, dwell time) to further improve generation quality.

Dynamic Trie for Real‑time Updates

Live streams change rapidly; therefore a dynamic Trie monitors the VQ‑VAE codebook updates (e.g., every minute) and performs beam‑search over the Trie to guarantee that generated ID sequences correspond to currently active live rooms.

Results

Offline evaluation shows UniSearch‑6L surpasses all baselines on MRR and reaches Recall@300 comparable to a 12‑layer model. In production, UniSearch yields a +3.31% increase in live‑room entry count (the highest gain in two years) and a –0.382% reduction in query‑change rate, with strong new‑user growth.