Artificial Intelligence 13 min read

How Kuaishou’s AgentX Enables Self‑Iterating Industrial Recommender Systems

AgentX introduces an agent‑driven closed‑loop that automates idea generation, code production, online A/B testing, and experience consolidation, boosting experiment concurrency eight‑fold, increasing per‑person business value 3.7×, and delivering over 0.5% app‑time growth and more than 1 billion RMB annual revenue.

Machine Heart

Jul 1, 2026

How Kuaishou’s AgentX Enables Self‑Iterating Industrial Recommender Systems

Over the past decade, recommender system research has focused on stronger modeling—finer features, larger models, longer sequences, and generative recommendation—but the real bottleneck in industrial settings lies in the R&D production workflow. Traditional manual pipelines require engineers to serially handle data analysis, design, coding, experiment configuration, A/B observation, metric attribution, and post‑mortem, limiting throughput to human capacity.

AgentX, presented by the Kuaishou AgentX team, proposes an agent‑driven R&D closed‑loop where agents become the primary executors of recommendation iteration. The system continuously generates hypotheses, writes production‑ready code, launches experiments, reads feedback, and distills each trajectory into fuel for the next round.

In real‑world deployment on the Kuaishou app, three AgentX workers processed 374 experiment ideas into 10 publishable results. Compared with manual iteration, a single worker achieved an 8× increase in concurrent experiments and a 3.7× rise in per‑person business value, contributing +0.561% cumulative app‑time and over 1 billion RMB annual revenue for the life‑service business.

The core of AgentX consists of four agents:

Brainstorm Agent : Converts vague business goals (e.g., increase watch time) into prioritized, evidence‑backed candidate solutions, specifying target metrics, required signals, expected mechanisms, risks, and validation methods.

Developing Agent : Generates code constrained by repository knowledge, feature schemas, DSL checks, C++ syntax validation, and dry‑run verification, ensuring compatibility with production pipelines and platform rules.

Evaluation Agent : Handles safe deployment, traffic bucketing, parameter conflict checks, metric collection, and guardrail vetoes, turning online A/B outcomes into actionable playbooks or failure constraints.

Harness Evolution : Uses Semantic‑Gradient‑based Prompt Optimization (SGPO) to analyze execution traces, identify missing business constraints or recurring code errors, and update sub‑agents via paired replay evaluations.

AgentX’s self‑evolution distinguishes it from simple automation: each execution enriches system capability rather than merely replicating human steps.

Experimental funnel: 374 ideas entered the system, 106 passed proposal review (28.34% pass rate), 100 completed code‑and‑launch (94.3% success), and 10 achieved positive evaluation (9.9% publishable). In the main‑feed scenario, 361 ideas yielded 8 publishable results; in the life‑service scenario, 13 ideas yielded 2 publishable results.

Efficiency gains include an average of 12 concurrent experiments per worker versus 1.5 for a human engineer (8× increase), 1.1 publishable results per worker per week versus 0.08 manually (13.8×), and a 3.7× higher per‑person contribution to app‑time growth.

AgentX also demonstrates self‑acceleration: weekly concurrent experiments rose from 15 to 60, idea pass rate from 15% to 45%, and weekly publishable results from 2 to 5, as accumulated failure patterns and dry‑run templates improve.

Beyond online strategy, AgentX extends to model research: it can ingest recent recommendation papers, reproduce methods on public datasets (KuaiRand, Taobao, Amazon, ML‑1M), and combine complementary modules into new architectures. In independent model experiments, the system achieved a +0.865% live‑stream duration lift.

Case study – PCV‑enhanced ranking: The first round introduced PCV boosting, yielding modest, statistically insignificant gains with diversity risks. The second round added quality gating, adaptive weighting, and baseline adjustments, resulting in +0.071% watch time and +0.118% exposure while keeping user‑experience safeguards stable.

These examples illustrate that AgentX’s strength lies in converting imperfect first‑round feedback into stronger subsequent hypotheses rather than delivering a perfect solution in a single pass.

In summary, AgentX answers three critical questions for automated recommendation R&D: (1) Agents can execute the full iteration loop when they operate within production constraints and undergo online A/B validation; (2) Experience generated by agents can be compounded via knowledge bases, failure assetization, and SGPO; (3) Agentic R&D already yields tangible business impact—8× concurrency, 3.7× per‑person value, +0.561% app‑time, and >1 billion RMB annual revenue.

Future work envisions a two‑layer engineering model: engineers collaborate with agents on business‑level goals while another layer focuses on evolving the agent framework, tools, and foundational models, turning each experiment into data that serves both short‑term optimization and long‑term intelligent growth.

AgentX demonstrates that when idea generation, code implementation, online evaluation, and experience consolidation are fully automated, recommender system iteration transcends linear human scaling and enters a phase of jointly compounding experience, compute, and intelligence.

AgentX technical report: https://arxiv.org/abs/2606.26859v2

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Automation A/B testing recommender systems self-iteration industrial AI AgentX

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.