Artificial Intelligence 14 min read

Mixing Heterogeneous Queues in Vivo's Information Flow and App Store: Challenges, Practices, and RL/Deep Learning Solutions

Vivo tackles the complex problem of mixing heterogeneous content queues—ads, games, and organic items—in its information‑flow and app‑store by evolving from rule‑based weighting to Q‑learning and deep‑learning position models that respect product constraints, preserve ordering, and balance short‑term revenue with long‑term user experience, while planning deeper personalization and on‑device solutions.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Mixing Heterogeneous Queues in Vivo's Information Flow and App Store: Challenges, Practices, and RL/Deep Learning Solutions

This article summarizes Shen Jiyi’s talk at the 2022 Vivo Developer Conference, focusing on the mixing (heterogeneous ranking) of multiple content queues such as ads, games, and organic items in Vivo’s information‑flow and app‑store scenarios.

Background

Mixing aims to combine heterogeneous items from different queues while preserving user experience and maximizing revenue. The problem is complex due to diverse modeling objectives (CTR vs. eCPM), product rule constraints (spacing, quota, first‑position), and the need for order‑preserving (stable) mixing.

Core Challenges

Different queues have incomparable modeling targets.

Heavy product‑rule constraints (spacing, quota, first‑position, etc.).

Upstream fine‑ranking algorithms produce ordered candidate lists that must remain ordered during mixing.

Information‑Flow Mixing Practice

The information‑flow scenario (browsers, i‑video, “negative‑one‑screen”, etc.) features many formats and strong personalization. Traditional fixed‑template mixing suffers from three obvious problems: user experience degradation, low targeting efficiency, and platform resource waste.

Industry solutions surveyed include:

Rule‑based value weighting (user‑experience vs. revenue) – simple but ignores item‑to‑item interactions and long‑term gain.

Reinforcement‑learning (RL) based sequence insertion – models the problem as an action selection per slot, balancing ad value and user experience, but often limited to single‑ad insertion and high engineering cost.

Vivo’s evolution for information‑flow mixing consists of three stages: fixed‑position mixing, Q‑learning mixing, and deep‑position mixing.

Q‑Learning Mixing

The Q‑learning approach treats mixing as a reinforcement‑learning problem. The agent observes state features (user, context, content, ad), selects an action (adjusted ad weight), receives a reward that combines short‑term and long‑term revenue with user‑experience metrics, and updates the Q‑table. This method enables page‑wide and long‑term optimization while remaining lightweight for online deployment.

Deep Position‑Based Mixing

To overcome Q‑learning limitations (small Q‑table capacity, limited feature usage, dependence on upstream scores), Vivo introduced a deep‑learning model that directly predicts item positions. The architecture resembles a dual‑tower DQN: the left tower encodes state information (user attributes, behavior, sequence features), the right tower encodes action (position) information. Sequence attention with a Transformer captures user‑history relevance, and match modules incorporate prior knowledge and cross‑features (CTR, TF‑IDF, etc.). This deep model decouples from upstream scores, increases capacity, and models inter‑item interactions.

App‑Store Mixing

The app‑store scenario mixes ad queues with game queues under stricter quota (保量) requirements and sparse user actions. Core challenges include multi‑party optimization (user, ad, game), quota‑revenue conflict, and uncertain game LTV estimation.

Vivo’s app‑store mixing pipeline includes:

Fixed‑position mixing.

PID‑based quota preservation.

Constraint‑aware re‑ranking.

Fine‑grained splitting.

PID ensures quota but does not directly tie to revenue. Constraint‑aware re‑ranking splits traffic, applying a re‑ranking model only to high‑quality flows to explore revenue while preserving quota for the rest.

Fine‑grained splitting further relaxes quota for selected branches, allowing the mixing model to pursue higher revenue on high‑quality traffic while PID handles low‑quality traffic.

Recent improvements replace the numeric re‑ranking model with a generative, context‑aware model, eliminating dependence on upstream scores and achieving more stable, higher‑gain performance.

Future Outlook

Model optimization: deeper, more personalized models with real‑time feedback.

Cross‑scenario collaboration: unified mixing across information‑flow and app‑store.

Unified paradigm: a common sequence generation and evaluation framework.

On‑device mixing: real‑time user‑interest capture for better experience.

The article concludes that heterogeneous mixing in Vivo’s internet ecosystem faces many challenges but has yielded measurable gains, and invites further discussion.

advertisingDeep Learningreinforcement learningmixinginformation flowApp Store
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.