Big Data 12 min read

Winning O2O Coupon Redemption with XGBoost, GBDT, and Feature Engineering

This article details a data-driven solution for the 2016 O2O coupon redemption competition, describing dataset partitioning, extensive feature engineering across user, merchant, and coupon dimensions, handling leakage, and model fusion using XGBoost, GBDT, and RandomForest, achieving top AUC scores through weighted ensemble.

21CTO
21CTO
21CTO
Winning O2O Coupon Redemption with XGBoost, GBDT, and Feature Engineering

Team Introduction

Team name “Poets Hide Underwater”, members: wepon and charles from Peking University, and Yunfan Tianyin from University of Science and Technology of China.

Problem Description

The task provides real online and offline consumption behavior from Jan 1 to Jun 30, 2016. The goal is to predict whether a user will redeem a coupon within 15 days after receiving it in July 2016. Evaluation uses AUC, averaging per‑coupon AUC values.

Solution Overview

Using the offline consumption and coupon receipt tables together with the online click/consume table, we split the data, extract user‑related, merchant‑related, coupon‑related, and user‑merchant interaction features, and also leverage leakage features that are unavailable in real production. Models (XGBoost, GBDT, RandomForest) are trained and fused.

Dataset Split

A sliding‑window method creates multiple training sets, increasing sample size and enabling cross‑validation for hyper‑parameter tuning.

Dataset split illustration
Dataset split illustration

Feature Engineering

Features are derived from the two provided datasets (online and offline). The offline data yields richer attributes, while the online data provides click and purchase signals.

User offline features

Number of coupons received

Number of coupons received but not used

Number of coupons used

Redemption rate after receipt

Redemption rates for different discount thresholds

Average/minimum/maximum consumption discount when redeeming

Count of distinct merchants where coupons were redeemed

Count of distinct coupons redeemed

Average coupons used per merchant

Average/minimum/maximum user‑merchant distance for redeemed coupons

User online features

Number of online actions

Click rate

Purchase rate

Coupon receipt rate

Number of non‑purchase actions

Online coupon redemption count and rate

Proportion of offline non‑purchase actions to total non‑purchase actions

Proportion of offline redemption to total redemption

Proportion of offline records to total records

Merchant features

Number of coupons received

Number of received coupons not redeemed

Number of received coupons redeemed

Redemption rate of received coupons

Average/minimum/maximum consumption discount for redeemed coupons

Number of distinct users who redeemed coupons

Average coupons redeemed per user

Number of distinct coupons redeemed by the merchant

Proportion of distinct redeemed coupons to all received coupons

Average coupons redeemed per coupon type

Average time to redemption

Average/minimum/maximum user‑merchant distance for redeemed coupons

User‑merchant interaction features

Number of coupons a user received from a merchant

Number of those coupons not redeemed

Number of those coupons redeemed

Redemption rate for coupons from that merchant

Proportion of a user’s non‑redeemed actions for a merchant to total non‑redeemed actions

Proportion of a user’s redeemed coupons for a merchant to total redeemed coupons

Proportion of a merchant’s non‑redeemed actions for a user to total non‑redeemed actions

Proportion of a merchant’s redeemed coupons for a user to total redeemed coupons

Coupon features

Coupon type (direct discount = 0, threshold discount = 1)

Discount rate

Minimum spend for threshold discounts

Historical occurrence count

Historical redemption count and rate

Historical redemption time rate

Day of week and day of month when coupon was received

User’s historical receipt and consumption counts for the coupon

User’s historical redemption rate for the coupon

Additional leakage features

Total number of coupons a user has received

Number of specific coupons a user has received

Number of coupons received before/after the current one

Time interval to previous/next coupon receipt

Number of coupons received from a specific merchant

Number of distinct merchants a user has received coupons from

Number of coupons received on the same day

Number of distinct coupon types a user has received

Number of coupons a merchant has issued and the variety of coupon types

Model Design

Based on the engineered features, models are trained and fused. The first season used only XGBoost and topped the leaderboard. In the second season, XGBoost, GBDT, and RandomForest were trained; GBDT performed best, followed by XGBoost, with RandomForest lagging. A weighted ensemble further improved performance:

0.65 * GBDT + 0.35 * XGBoost
Source: Big Data
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GBDTfeature engineeringensembleXGBoostcoupon redemption
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.