Artificial Intelligence 11 min read

How Non‑Autoregressive Generative Models Transform Recommendation Reranking

This article presents a KDD‑2024 accepted solution that replaces autoregressive generators with a non‑autoregressive model for video recommendation reranking, detailing the challenges, model architecture, novel loss function, extensive offline and online experiments, and practical Q&A from the conference.

Baobao Algorithm Notes

Nov 25, 2024

How Non‑Autoregressive Generative Models Transform Recommendation Reranking

Background and Motivation

Reranking is the final stage of a recommendation pipeline, directly shaping the ordered list of videos shown to users. Unlike earlier ranking stages that score items independently, reranking must consider interactions among videos to maximize overall user utility.

Challenges of Reranking

Given n candidates from the coarse ranker, the reranker must select m videos and order them, leading to an astronomically large candidate sequence space. Exhaustive search is infeasible, and only one sequence is exposed to the user, causing severe sample sparsity. Moreover, online systems demand low latency under strict resource constraints.

Existing Approaches

One‑stage methods treat reranking as a retrieval task, selecting the top‑k items based on pointwise scores. This creates a logical inconsistency because the ranking order after reranking differs from the pre‑ranking order, rendering the scores less meaningful.

Two‑stage methods adopt a generator‑evaluator framework. The generator produces multiple plausible sequences, and the evaluator selects the best one. Generators are either heuristic or autoregressive generative models, which suffer from high inference latency and error accumulation.

Proposed Non‑Autoregressive Model (NAR4Rec)

We introduce a non‑autoregressive generative model that produces the entire recommendation sequence in a single inference pass, dramatically reducing latency.

Comparison of autoregressive vs. non‑autoregressive models

Matching Model for Variable Vocabulary

To address the sparsity of recommendation sequences and the variable candidate set, we design a matching model consisting of a candidate encoder and a position encoder. Position embeddings are shared across samples, enabling efficient training on sparse data.

The inner product between candidate and position embeddings yields a probability matrix P(i, j), where P(i, j) denotes the probability of placing candidate i at position j .

Decoding Strategies

Autoregressive decoding factorizes the sequence probability into a product of conditional probabilities, while our non‑autoregressive approach assumes conditional independence across positions, leading to a multi‑modal distribution problem.

Illustration of conditional independence assumption

Sequence Non‑Likelihood Loss

Standard maximum‑likelihood training aims to increase the probability of observed sequences, but in recommendation the exposed sequence may be sub‑optimal. We therefore propose a sequence non‑likelihood loss that simultaneously maximizes the likelihood of high‑utility sequences and minimizes the likelihood of low‑utility ones.

Given a candidate set X and a negative exposure sequence Y, the loss is defined as L = -log(1 - P(Y|X)), where a smaller P(Y|X) yields a smaller loss.

Experiments

Offline evaluation on the Avito dataset (5 candidates, 5 generated items) uses AUC and NDCG, while on a proprietary Kuaishou dataset (60 candidates, 6 displayed items) we report Recall@6 and Recall@10. NAR4Rec achieves state‑of‑the‑art performance across all metrics and matches the inference speed of pointwise ranking models.

Online A/B testing confirms significant uplift in user engagement, with the non‑autoregressive model delivering comparable or better gains than autoregressive baselines while consuming far less inference time.

Conference Q&A Highlights

Handling variable sequence length : Position embeddings are fixed‑length; to support variable lengths one can predict the target length as an auxiliary task, similar to NLP approaches.

Effect of swapping two items in the label sequence : Swapping changes the probability matrix and thus the loss, because each position’s embedding interacts differently with the candidate set.

Contrastive decoding : The current implementation greedily selects the highest‑scoring token per position, but it can be extended to beam search or top‑k sampling for multiple candidate sequences.

Distinguishing positive and negative sequences : We use a post‑hoc interaction label that aggregates pointwise item labels; sequences above a threshold are treated as positive, below as negative.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation systems Generative AI KDD2024 Reranking non-autoregressive models sequence loss

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.