How Ant Group Scales Mobile AI with Thousand‑by‑Thousand Models

This article reviews Ant Group's "thousand‑by‑thousand" model strategy for mobile AI, detailing the challenges of device compute and user heterogeneity, the engineering upgrades for offline development and online release, and the measurable business improvements achieved.

Alipay Experience Technology
Alipay Experience Technology
Alipay Experience Technology
How Ant Group Scales Mobile AI with Thousand‑by‑Thousand Models

Background Introduction

Editor’s note: This is the third article in Alipay Experience Technology Salon, written by Ant Group R&D engineer Nan Feng, introducing the "thousand‑by‑thousand" model technology, its scenarios, technical solutions across model development, publishing, and online monitoring, and the progress in real‑world business.

Traditional model application – thousand‑person thousand‑face

Before the thousand‑by‑thousand approach, predictions were made by extracting features (e.g., hometown, seasonal information) and training a single model to predict outcomes such as a user’s dinner choice. This "thousand‑person thousand‑face" method relies on feature differences to personalize predictions, but it encounters issues on the client side.

Problem 1: Device compute differences

Unlike server environments where models run on uniform hardware, client devices vary widely in processing power. Statistics show significant variance in model execution time across phones, and unlike servers, client resources cannot be scaled up, forcing developers to simplify models for lower‑end devices at the cost of accuracy.

Problem 2: User feature differences

Training on all data pushes the model toward global optimum, favoring high‑frequency users and neglecting low‑frequency or long‑tail users, leading to a Matthew effect where personalization suffers for less common user groups.

Thousand X Thousand Models

Solution Approach

To address compute disparity, multiple models of varying complexity are prepared: complex models run on high‑end devices, while simpler models run on low‑end devices, matching model complexity to device capability.

To handle user heterogeneity, users are clustered into fine‑grained groups, and a dedicated model is trained for each group, improving description accuracy compared to a global model.

Combining both ideas yields the "thousand‑by‑thousand" concept, extending beyond just device or user dimensions.

Thousand X Thousand Model Engineering Pipeline Upgrade

Scaling from a single model to thousands is not a simple copy‑paste task. Offline development originally required manual steps across multiple platforms (training, xNN conversion, git packaging, real‑device testing). When the model count grew, automation was introduced to link platforms, generate user‑group and device‑group data, and streamline the entire offline workflow.

On the server side, deploying thousands of models challenges resource utilization and deployment speed, but for edge devices each user still pulls only one model, shifting the difficulty to accurate cloud‑side model publishing.

Unified Release Pipeline Upgrade

Initial releases leveraged the client‑configuration center with strong monitoring, gray‑release, and rollback capabilities. However, this approach could not support per‑user model selection, incurred unnecessary resource usage, and faced size limits for large model payloads.

The upgraded two‑stage release first uses the existing configuration channel to deliver a lightweight config ID, then the client fetches the appropriate model based on that ID, solving the three identified issues.

Challenges of Thousand X Thousand Models

More models do not always mean better performance; over‑fitting can occur when some user groups lack sufficient data.

Linking device compute to model selection requires offline device grading via real‑device tests and may evolve to dynamic grading using online runtime data.

Detecting problematic models post‑deployment relies on online A/B experiments and monitoring to replace under‑performing models.

Business Results

Business 1, using thousand‑person models, improved a key metric from 92.5 to 94.6 after two years, and the thousand‑by‑thousand upgrade added another 1–1.5 percentage points.

Business 2, leveraging thousand‑device models, raised the minimum model accuracy from 85.7% by assigning larger models to higher‑end devices, achieving noticeable performance gains.

personalizationEdge computingAImodel deploymentAnt Group
Alipay Experience Technology
Written by

Alipay Experience Technology

Exploring ultimate user experience and best engineering practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.