Artificial Intelligence 12 min read

Ctrip's Automated Iterative Anti‑Fraud Modeling Framework for Payment Risk

The article describes Ctrip's payment fraud risk characteristics, a comprehensive automated iterative anti‑fraud model framework—including variable system, GAN‑augmented sample generation, RNN behavior encoding, and tree‑based classifiers—and demonstrates how this approach restores recall performance compared with traditional static models.

DataFunTalk
DataFunTalk
DataFunTalk
Ctrip's Automated Iterative Anti‑Fraud Modeling Framework for Payment Risk

Payment fraud risk, caused by leaked card or account information, threatens both users and Ctrip's platform; the financial risk control team must accurately identify and block such transactions without hindering legitimate travel.

The fraud scenario exhibits three key traits: high adversarial nature, complex user‑behavior mimicry, and a scarcity of labeled bad samples.

To combat these challenges, Ctrip built an automated iterative anti‑fraud model system that speeds up model updates, reduces manual engineering effort, and employs Generative Adversarial Networks (GANs) to synthesize additional fraud samples, enabling a "see‑and‑counter" capability.

The risk‑variable system draws from account, payment, travel, finance, and IP‑location data, combining real‑time computed variables with offline T+1 cleaned variables to form a rich feature pool.

The iterative framework consists of: (1) trigger conditions (time‑based or performance‑driven); (2) a variable library of recent samples; (3) variable processing (PSI stability check, missing‑value/abnormal‑value filling, one‑hot encoding for categorical features); (4) algorithm‑derived variables via deep learning; (5) GAN‑generated synthetic fraud cases; (6) a main model—typically tree‑based classifiers such as Random Forest, XGBoost, or LightGBM; (7) deployment outputting PMML models, feature‑engineering code, and derived‑variable methods; (8) threshold setting based on short‑term production performance; and (9) a monitoring suite for variable drift, model PSI, and business‑effect metrics.

For sequential user behavior, an RNN is trained on UBT action and pageview data; its hidden‑layer outputs become additional features for the main model, capturing order‑sensitive patterns that traditional aggregated features miss.

Empirical results show that the traditional "Tianyan‑I" model’s recall fell from >12% to ~7.9% across OOT datasets, while the automated iterative "Tianyan‑II" framework restored recall to ~11.5% at 80% precision, confirming the benefit of rapid model refresh and synthetic sample augmentation.

The authors conclude that, despite automation, human oversight remains essential for new variable configuration and detailed fraud case analysis to maintain model controllability while achieving high effectiveness.

machine learningGANanti-fraudRNNpayment fraudrisk modeling
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.