Artificial Intelligence 12 min read

Ctrip's Automated Iterative Anti‑Fraud Modeling Framework for Payment Risk

The article describes Ctrip's payment fraud risk characteristics, a comprehensive automated iterative anti‑fraud model framework—including variable system, GAN‑augmented sample generation, RNN behavior encoding, and tree‑based classifiers—and demonstrates how this approach restores recall performance compared with traditional static models.

DataFunTalk

Jun 22, 2020

Ctrip's Automated Iterative Anti‑Fraud Modeling Framework for Payment Risk

Payment fraud risk, caused by leaked card or account information, threatens both users and Ctrip's platform; the financial risk control team must accurately identify and block such transactions without hindering legitimate travel.

The fraud scenario exhibits three key traits: high adversarial nature, complex user‑behavior mimicry, and a scarcity of labeled bad samples.

To combat these challenges, Ctrip built an automated iterative anti‑fraud model system that speeds up model updates, reduces manual engineering effort, and employs Generative Adversarial Networks (GANs) to synthesize additional fraud samples, enabling a "see‑and‑counter" capability.

The risk‑variable system draws from account, payment, travel, finance, and IP‑location data, combining real‑time computed variables with offline T+1 cleaned variables to form a rich feature pool.

The iterative framework consists of: (1) trigger conditions (time‑based or performance‑driven); (2) a variable library of recent samples; (3) variable processing (PSI stability check, missing‑value/abnormal‑value filling, one‑hot encoding for categorical features); (4) algorithm‑derived variables via deep learning; (5) GAN‑generated synthetic fraud cases; (6) a main model—typically tree‑based classifiers such as Random Forest, XGBoost, or LightGBM; (7) deployment outputting PMML models, feature‑engineering code, and derived‑variable methods; (8) threshold setting based on short‑term production performance; and (9) a monitoring suite for variable drift, model PSI, and business‑effect metrics.

For sequential user behavior, an RNN is trained on UBT action and pageview data; its hidden‑layer outputs become additional features for the main model, capturing order‑sensitive patterns that traditional aggregated features miss.

Empirical results show that the traditional "Tianyan‑I" model’s recall fell from >12% to ~7.9% across OOT datasets, while the automated iterative "Tianyan‑II" framework restored recall to ~11.5% at 80% precision, confirming the benefit of rapid model refresh and synthetic sample augmentation.

The authors conclude that, despite automation, human oversight remains essential for new variable configuration and detailed fraud case analysis to maintain model controllability while achieving high effectiveness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning GaN anti-fraud RNN payment fraud risk modeling

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.