Artificial Intelligence 9 min read

Data-Driven Simulation for User Activity Retention Prediction

By extracting hour‑level activity logs and training supervised models—including CART, GBDT, and neural networks—on user tags, the team simulated short‑term metrics for new reward campaigns, enabling earlier prediction of next‑day retention and shortening experiment cycles despite delayed T+1 data.

Xianyu Technology

Feb 27, 2020

Data-Driven Simulation for User Activity Retention Prediction

Background : In the internet industry, user‑operation activities such as red‑packet campaigns are used to increase product stickiness. Users are segmented by activity level (low, medium, high) and AB experiments are run on reward designs to improve next‑day retention.

Pain point : For T+0 experiments, only immediate metrics (exposure, click, redemption) are available; the T+1 retention metric can only be observed at T+2, lengthening the experiment cycle.

Simulation solution : Hour‑level historical activity logs are extracted, grouped by user tags, and a supervised‑learning model is trained. The model predicts short‑term hour‑level metrics for new activities and compares the results.

Data organization : Historical activities with common characteristics (growth, promotion, etc.) are selected. User tags (e.g., activity level, sensitivity, recent purchase count) are used to create hierarchical groups. Hourly data are accumulated at each hour to reduce random error for early‑hour low‑participation periods.

Model design : Metrics are classified into three types—observed, real‑time, and delayed. Real‑time metrics are derived from observed ones; delayed metrics (e.g., next‑day retention) are only available later.

Prediction models :

CART: regression tree that splits features to minimize squared error, producing region‑wise averages.

GBDT: gradient‑boosted decision trees that iteratively fit residuals of previous CART models.

Neural Network (NN): data‑driven model with preprocessing, fully‑connected layers, ReLU, dropout, and sigmoid output.

Results : The simulation was applied to Xianyu’s “222” promotion across eight venues. Using all observed and real‑time metrics, next‑day retention models were built. Experiments showed that NN and GBDT generally achieved lower mean‑square error than CART, though performance varied by venue and tag granularity.

Outlook : Future work includes improving predictions for low‑exposure groups, incorporating adjacent hourly influences, and generating actionable operation recommendations based on model outputs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AB testing CART GBDT machine learning Neural Network Simulation User Retention

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.