Data-Driven Simulation for User Activity Retention Prediction
By extracting hour‑level activity logs and training supervised models—including CART, GBDT, and neural networks—on user tags, the team simulated short‑term metrics for new reward campaigns, enabling earlier prediction of next‑day retention and shortening experiment cycles despite delayed T+1 data.
Background : In the internet industry, user‑operation activities such as red‑packet campaigns are used to increase product stickiness. Users are segmented by activity level (low, medium, high) and AB experiments are run on reward designs to improve next‑day retention.
Pain point : For T+0 experiments, only immediate metrics (exposure, click, redemption) are available; the T+1 retention metric can only be observed at T+2, lengthening the experiment cycle.
Simulation solution : Hour‑level historical activity logs are extracted, grouped by user tags, and a supervised‑learning model is trained. The model predicts short‑term hour‑level metrics for new activities and compares the results.
Data organization : Historical activities with common characteristics (growth, promotion, etc.) are selected. User tags (e.g., activity level, sensitivity, recent purchase count) are used to create hierarchical groups. Hourly data are accumulated at each hour to reduce random error for early‑hour low‑participation periods.
Model design : Metrics are classified into three types—observed, real‑time, and delayed. Real‑time metrics are derived from observed ones; delayed metrics (e.g., next‑day retention) are only available later.
Prediction models :
CART: regression tree that splits features to minimize squared error, producing region‑wise averages.
GBDT: gradient‑boosted decision trees that iteratively fit residuals of previous CART models.
Neural Network (NN): data‑driven model with preprocessing, fully‑connected layers, ReLU, dropout, and sigmoid output.
Results : The simulation was applied to Xianyu’s “222” promotion across eight venues. Using all observed and real‑time metrics, next‑day retention models were built. Experiments showed that NN and GBDT generally achieved lower mean‑square error than CART, though performance varied by venue and tag granularity.
Outlook : Future work includes improving predictions for low‑exposure groups, incorporating adjacent hourly influences, and generating actionable operation recommendations based on model outputs.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.