Information Security 14 min read

How Tencent Combats Fraudsters with Big Data and AI‑Powered Risk Engines

This article explains how Tencent uses big‑data collection, user profiling, and AI‑driven risk learning engines to detect and block malicious accounts, proxy IPs, and fraudulent activities across e‑commerce and other platforms, detailing the architecture, algorithms, and practical defenses employed.

21CTO

Dec 7, 2015

How Tencent Combats Fraudsters with Big Data and AI‑Powered Risk Engines

Background

In the past one to two years, the e‑commerce industry has exploded with startups offering heavy subsidies to attract users, which also fuels the rise of “wool‑party” fraudsters who create massive fake accounts to exploit promotions.

Current Black‑Market Landscape

These fraudsters operate in organized groups of around 200,000 people nationwide, dividing tasks among software creation teams, SMS relay platforms, account resale groups, and order‑brushing gangs.

Characteristics of Fraudsters

Professional: dedicated teams and machines.

Gang‑oriented: clear division of labor across the fraud chain.

Regional: concentrated in economically developed coastal cities.

Anti‑Brushing Strategy

Registration stage: detect and block fake registrations.

Activity stage: the main battlefield, using captchas (SMS/voice) and drastically reducing benefits for suspicious accounts.

Tencent’s Internal Anti‑Fraud Architecture

Risk Learning Engine

The engine uses C++‑implemented DBScan for fast clustering on massive data and employs a black/white dual‑classifier mechanism to minimize false positives on normal users.

Black classifier evaluates the probability of an abnormal request based on features, machine‑learning models, and rule‑based heuristics; the white classifier estimates the probability of a normal request.

Matrix‑Style Logic Framework

Using a matrix of weak classifiers combined via Adaboost (horizontal) and Bagging (vertical), the system isolates risks for specific account types, simplifying implementation, easing model training, and improving robustness.

Big‑Data Collection Dimensions

Data breadth: collect diverse data from social, gaming, payment, and media domains.

Data depth: capture registration, login, and usage data to enable deep‑defense analysis.

Big‑Data Processing Platform – “Magic Cube”

The platform integrates MySQL, MongoDB, Spark, and Hadoop, allowing analysts to write simple SQL or configure jobs for routine analysis, storing security‑relevant data for offline model training and real‑time services.

User Profiling

Profiles tag accounts and devices (e.g., IP, QQ) with attributes such as proxy usage or suspicious behavior, supporting fine‑grained risk policies.

Proxy IP Detection

Reverse probing of common proxy ports (80, 8080, etc.).

Inspect X‑Forwarded‑For HTTP header.

Check for Proxy‑Connection in Keep‑Alive packets.

Identify unusually high open ports (>10000).

To avoid exhaustive scanning, Tencent first flags suspicious IPs via business modeling, then probes them, discovering millions of malicious (often proxy) IPs daily.

Defense Logic and System Integration

The real‑time system, written in C/C++, shares memory across machines, tolerating some data inconsistency because risk assessment is probabilistic. Emergency switches and rapid response channels (WeChat, SMS) are in place to prevent cascading failures.

Integration Scenarios

E‑commerce O2O order‑brushing, coupon abuse, red‑packet fraud.

Preventing fake account registration.

Blocking credential stuffing and password‑hash cracking.

Mitigating malicious logins.

Q&A Highlights

Q: Is the risk learning engine self‑developed or based on open‑source? A: Online components are built in C/C++; offline training leverages Python open‑source libraries.

Q: Has MongoDB been modified for the Magic Cube platform? A: Yes, the storage engine has been customized.

Q: Difference between black and white classifiers? A: Black classifiers detect abnormal requests; white classifiers identify normal ones.

Q: How are risk weights determined? A: Through training on positive/negative samples, parameter significance checks, and manual verification.

Q: What happens if a normal user is mistakenly blocked? A: The user experiences additional verification steps (captcha, manual review) but is not permanently denied.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data fraud detection anti-fraud user profiling information security risk engine

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.