Artificial Intelligence 7 min read

Detecting Low‑Quality New Users in Food Delivery with a GBDT + LR Model

The article describes a data‑driven approach for identifying low‑value new users in a food‑delivery platform by labeling 7‑day repeat‑purchase behavior, extracting order, behavior, merchant and user features, and training a combined Gradient Boosted Decision Tree and Logistic Regression model to improve fraud detection and merchant penalty decisions.

Baidu Waimai Technology Team

Jun 27, 2017

Detecting Low‑Quality New Users in Food Delivery with a GBDT + LR Model

Background In the food‑delivery scenario, acquiring new users is costly and some merchants use low‑price or fraudulent tactics to boost KPI and subsidies, so the platform needs to identify low‑value new users for possible penalties.

Key Terms "Acquisition" (拉新) refers to bringing new users via various means, while "Repeat Purchase" (复购) counts a user’s second purchase on a different day as a repeat, regardless of multiple purchases on the same day.

Quality Judgment Standard The metric is the 7‑day repeat‑purchase rate of new users, calculated as the number of new users who purchase at least twice within seven days (excluding multiple orders on the same day) divided by the total number of new users. Merchants whose rate falls below a threshold are considered inefficient at acquisition.

Overall Process The workflow includes data collection, labeling, feature extraction, model training, and deployment, as illustrated in the accompanying diagram.

Data Collection and Labeling Training samples are gathered from the past three months, labeling a user as 1 if they repeat‑purchase within seven days, otherwise 0. Test samples are collected from the previous week with the same binary labels.

Feature Extraction Features are derived from orders, user behavior, merchant attributes, and user profiles.

Model Selection A Gradient Boosted Decision Tree (GBDT) is first trained to generate leaf‑node identifiers, which serve as high‑dimensional features for a Logistic Regression (LR) model. This GBDT + LR combination leverages GBDT’s ability to capture non‑linear interactions and automatically produce useful feature combinations for LR.

Training Steps

1. Train a GBDT model using the labeled training data.

2. Pass each sample through the trained GBDT; the leaf nodes reached become one‑dimensional LR features.

3. Train an LR model on these derived features.

Advantages of GBDT + LR The GBDT component discovers discriminative feature interactions and combinations that linear models alone cannot capture, reducing reliance on manual feature engineering while maintaining interpretability for the LR layer.

Deployment and Outcome The final repeat‑purchase probability model is applied to merchants; users with predicted probabilities below a set threshold are flagged as unlikely to repurchase, enabling targeted penalties for merchants with low‑quality new users.

Summary and Outlook The current model achieves 97% precision and 43% recall in identifying low‑repeat‑purchase merchants. Although precision is high, recall is modest, prompting future work to incorporate additional dimensions and improve detection coverage.

Author Introduction The author is a core member of Baidu Waimai Risk Control Team, responsible for risk‑control algorithms and strategies since 2015, focusing on merchant and BD risk mitigation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT feature engineering AI logistic regression risk control food delivery new user detection

Written by

Baidu Waimai Technology Team

The Baidu Waimai Technology Team supports and drives the company's business growth. This account provides a platform for engineers to communicate, share, and learn. Follow us for team updates, top technical articles, and internal/external open courses.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.