How Tencent Combats Fraudsters with Big Data and AI‑Powered Risk Engines
This article explains how Tencent uses big‑data collection, user profiling, and AI‑driven risk learning engines to detect and block malicious accounts, proxy IPs, and fraudulent activities across e‑commerce and other platforms, detailing the architecture, algorithms, and practical defenses employed.
Background
In the past one to two years, the e‑commerce industry has exploded with startups offering heavy subsidies to attract users, which also fuels the rise of “wool‑party” fraudsters who create massive fake accounts to exploit promotions.
Current Black‑Market Landscape
These fraudsters operate in organized groups of around 200,000 people nationwide, dividing tasks among software creation teams, SMS relay platforms, account resale groups, and order‑brushing gangs.
Characteristics of Fraudsters
Professional: dedicated teams and machines.
Gang‑oriented: clear division of labor across the fraud chain.
Regional: concentrated in economically developed coastal cities.
Anti‑Brushing Strategy
Registration stage: detect and block fake registrations.
Login stage: raise the barrier for malicious logins using captchas and SMS verification.
Activity stage: the main battlefield, using captchas (SMS/voice) and drastically reducing benefits for suspicious accounts.
Tencent’s Internal Anti‑Fraud Architecture
Risk Learning Engine
The engine uses C++‑implemented DBScan for fast clustering on massive data and employs a black/white dual‑classifier mechanism to minimize false positives on normal users.
Black classifier evaluates the probability of an abnormal request based on features, machine‑learning models, and rule‑based heuristics; the white classifier estimates the probability of a normal request.
Matrix‑Style Logic Framework
Using a matrix of weak classifiers combined via Adaboost (horizontal) and Bagging (vertical), the system isolates risks for specific account types, simplifying implementation, easing model training, and improving robustness.
Big‑Data Collection Dimensions
Data breadth: collect diverse data from social, gaming, payment, and media domains.
Data depth: capture registration, login, and usage data to enable deep‑defense analysis.
Big‑Data Processing Platform – “Magic Cube”
The platform integrates MySQL, MongoDB, Spark, and Hadoop, allowing analysts to write simple SQL or configure jobs for routine analysis, storing security‑relevant data for offline model training and real‑time services.
User Profiling
Profiles tag accounts and devices (e.g., IP, QQ) with attributes such as proxy usage or suspicious behavior, supporting fine‑grained risk policies.
Proxy IP Detection
Reverse probing of common proxy ports (80, 8080, etc.).
Inspect X‑Forwarded‑For HTTP header.
Check for Proxy‑Connection in Keep‑Alive packets.
Identify unusually high open ports (>10000).
To avoid exhaustive scanning, Tencent first flags suspicious IPs via business modeling, then probes them, discovering millions of malicious (often proxy) IPs daily.
Defense Logic and System Integration
The real‑time system, written in C/C++, shares memory across machines, tolerating some data inconsistency because risk assessment is probabilistic. Emergency switches and rapid response channels (WeChat, SMS) are in place to prevent cascading failures.
Integration Scenarios
E‑commerce O2O order‑brushing, coupon abuse, red‑packet fraud.
Preventing fake account registration.
Blocking credential stuffing and password‑hash cracking.
Mitigating malicious logins.
Q&A Highlights
Q: Is the risk learning engine self‑developed or based on open‑source? A: Online components are built in C/C++; offline training leverages Python open‑source libraries.
Q: Has MongoDB been modified for the Magic Cube platform? A: Yes, the storage engine has been customized.
Q: Difference between black and white classifiers? A: Black classifiers detect abnormal requests; white classifiers identify normal ones.
Q: How are risk weights determined? A: Through training on positive/negative samples, parameter significance checks, and manual verification.
Q: What happens if a normal user is mistakenly blocked? A: The user experiences additional verification steps (captcha, manual review) but is not permanently denied.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
