Artificial Intelligence 17 min read

Ximalaya’s LLM‑Powered Interactive Recommendation System: Architecture and Results

The article details Ximalaya’s three‑layer interactive recommendation architecture—PBox for parameter control, an LLM‑driven Agent for intent understanding, and the iSUG interface—showing how natural‑language‑based parameter tuning shifts the paradigm from one‑way push to two‑way dialogue and significantly improves recommendation efficiency and user retention.

Ximalaya Technology Team

May 27, 2026

Ximalaya’s LLM‑Powered Interactive Recommendation System: Architecture and Results

Introduction

The paper systematically presents Ximalaya’s interactive recommendation system, addressing the limitations of its existing one‑way implicit recommendation, high user expression barriers, and lack of personalized control.

Background and Challenges

Three content‑delivery paths—recommendation, search, and ranking lists—each suffer from structural drawbacks: recommendation relies on passive implicit feedback, search demands precise keyword input, and ranking lists cannot adapt to niche interests, all preventing real‑time user‑system dialogue.

AI‑Driven Paradigm Shift

Large language models (LLMs) break the long‑standing barrier by interpreting natural‑language intents and enabling two‑way conversations, a capability now adopted by major platforms such as YouTube, Meta, and Spotify.

Overall Architecture

The system adopts a three‑layer design:

PBox : a parameterized box serving as the control hub, storing per‑user recommendation parameters.

LLM Interactive Agent : the intent engine that parses user utterances via function calling.

Interactive Suggestion (iSUG) : the entry point that presents context‑aware suggestions.

When a user triggers the interaction (e.g., after several non‑clicks), the LLM Agent extracts intent, generates structured commands, and updates the user’s PBox parameters, after which the recommendation pipeline re‑generates results.

PBox: Parameterized Control Framework

PBox replaces the monolithic static configuration of traditional recommenders with a dynamic, multi‑dimensional parameter set that can be adjusted per user at each stage of the pipeline (recall, coarse‑ranking, fine‑ranking, mixing). Eight dimensions and sixteen factors are currently deployed, stored in a Redis‑backed KV store keyed by userId.

Function Calling for Intent‑Driven Parameter Updates

The LLM Agent uses function calling to translate natural language into concrete parameter adjustments. Example calls:

update_interest_weight(user_id, category="有声书", weight_increase=0.3)</code>
<code>update_mood_preference(user_id, mood="完结", weight=0.8)

These functions modify the corresponding fields in the user’s PBox.

Asynchronous Decoupling and Latency Optimization

Since LLM inference latency (200 ms–1500 ms) exceeds the recommendation system’s <150 ms response budget, updates are applied asynchronously: after PBox parameters are written, a front‑end refresh signal triggers the next screen, keeping the user’s mental model (“I said, show me next…”) intact while keeping end‑to‑end latency under 100 ms.

Domain‑Specific Fine‑Tuning

To improve accuracy on Ximalaya’s proprietary categories (e.g., “有声书‑女频‑现代言情”) and expressions (“完结”, “付费”), the base Qwen‑3B model is LoRA‑fine‑tuned. Post‑fine‑tuning, intent classification reaches near‑100 % accuracy and inference latency drops below 1 s.

Interactive Suggestion (iSUG) Design

iSUG provides millisecond‑level recall using pre‑computed caches and inverted indexes. Candidate sources include behavior‑based, category‑based, hotspot‑based, function‑driven, and LLM‑personalized suggestions. Ranking leverages a Deep Interest Network (DIN) that fuses static user features with dynamic behavior sequences to predict click‑through probability.

Experimental Results

Three staged experiments on the home page demonstrated measurable gains:

Phase 1: 7‑day retention increased by 0.88 % overall, with power‑users seeing a 2 %–5 % lift.

Phase 2: After increasing trigger frequency and improving intent accuracy, retention rose an additional 0.12 %.

Phase 3: Introducing LLM‑driven intent chains and visual feedback yielded >1 % retention uplift for users engaging with the interactive feature.

Future Outlook

The authors envision a fully intelligent recommendation assistant that continuously fuses user genetics with real‑time context, delivering open‑ended dialogue‑driven recommendations and moving beyond the “algorithm black box” to a real‑time, empathetic partner.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Interactive recommendation LLM Ximalaya Parameterization FunctionCalling

Written by

Ximalaya Technology Team

Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.