Winning O2O Coupon Redemption with XGBoost, GBDT, and Feature Engineering
This article details a data-driven solution for the 2016 O2O coupon redemption competition, describing dataset partitioning, extensive feature engineering across user, merchant, and coupon dimensions, handling leakage, and model fusion using XGBoost, GBDT, and RandomForest, achieving top AUC scores through weighted ensemble.
Team Introduction
Team name “Poets Hide Underwater”, members: wepon and charles from Peking University, and Yunfan Tianyin from University of Science and Technology of China.
Problem Description
The task provides real online and offline consumption behavior from Jan 1 to Jun 30, 2016. The goal is to predict whether a user will redeem a coupon within 15 days after receiving it in July 2016. Evaluation uses AUC, averaging per‑coupon AUC values.
Solution Overview
Using the offline consumption and coupon receipt tables together with the online click/consume table, we split the data, extract user‑related, merchant‑related, coupon‑related, and user‑merchant interaction features, and also leverage leakage features that are unavailable in real production. Models (XGBoost, GBDT, RandomForest) are trained and fused.
Dataset Split
A sliding‑window method creates multiple training sets, increasing sample size and enabling cross‑validation for hyper‑parameter tuning.
Feature Engineering
Features are derived from the two provided datasets (online and offline). The offline data yields richer attributes, while the online data provides click and purchase signals.
User offline features
Number of coupons received
Number of coupons received but not used
Number of coupons used
Redemption rate after receipt
Redemption rates for different discount thresholds
Average/minimum/maximum consumption discount when redeeming
Count of distinct merchants where coupons were redeemed
Count of distinct coupons redeemed
Average coupons used per merchant
Average/minimum/maximum user‑merchant distance for redeemed coupons
User online features
Number of online actions
Click rate
Purchase rate
Coupon receipt rate
Number of non‑purchase actions
Online coupon redemption count and rate
Proportion of offline non‑purchase actions to total non‑purchase actions
Proportion of offline redemption to total redemption
Proportion of offline records to total records
Merchant features
Number of coupons received
Number of received coupons not redeemed
Number of received coupons redeemed
Redemption rate of received coupons
Average/minimum/maximum consumption discount for redeemed coupons
Number of distinct users who redeemed coupons
Average coupons redeemed per user
Number of distinct coupons redeemed by the merchant
Proportion of distinct redeemed coupons to all received coupons
Average coupons redeemed per coupon type
Average time to redemption
Average/minimum/maximum user‑merchant distance for redeemed coupons
User‑merchant interaction features
Number of coupons a user received from a merchant
Number of those coupons not redeemed
Number of those coupons redeemed
Redemption rate for coupons from that merchant
Proportion of a user’s non‑redeemed actions for a merchant to total non‑redeemed actions
Proportion of a user’s redeemed coupons for a merchant to total redeemed coupons
Proportion of a merchant’s non‑redeemed actions for a user to total non‑redeemed actions
Proportion of a merchant’s redeemed coupons for a user to total redeemed coupons
Coupon features
Coupon type (direct discount = 0, threshold discount = 1)
Discount rate
Minimum spend for threshold discounts
Historical occurrence count
Historical redemption count and rate
Historical redemption time rate
Day of week and day of month when coupon was received
User’s historical receipt and consumption counts for the coupon
User’s historical redemption rate for the coupon
Additional leakage features
Total number of coupons a user has received
Number of specific coupons a user has received
Number of coupons received before/after the current one
Time interval to previous/next coupon receipt
Number of coupons received from a specific merchant
Number of distinct merchants a user has received coupons from
Number of coupons received on the same day
Number of distinct coupon types a user has received
Number of coupons a merchant has issued and the variety of coupon types
Model Design
Based on the engineered features, models are trained and fused. The first season used only XGBoost and topped the leaderboard. In the second season, XGBoost, GBDT, and RandomForest were trained; GBDT performed best, followed by XGBoost, with RandomForest lagging. A weighted ensemble further improved performance:
0.65 * GBDT + 0.35 * XGBoostSource: Big Data
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
