Artificial Intelligence 13 min read

Intelligent Risk Control in Live Streaming: Architecture, Challenges, and Model Evolution at Douyu

This article presents Douyu's intelligent risk‑control system for live streaming, detailing the operational, activity, traffic, account, transaction and content safety challenges, the multi‑layer algorithm architecture, and the evolution of models for spam detection, risk scoring, gang identification, behavior sequencing, device fingerprinting, and interpretability.

DataFunTalk

Aug 16, 2021

Intelligent Risk Control in Live Streaming: Architecture, Challenges, and Model Evolution at Douyu

Sharing Guest: Gong Can, Algorithm Lead at Douyu

Editor: Wang Yanlei, LingShu Technology

Platform: DataFunTalk

Introduction: The live‑streaming industry faces numerous security risks such as operational, activity, traffic, account, transaction, and content safety. Intelligent risk control confronts technical challenges including high‑frequency adversarial attacks, diverse scenarios, and weak model interpretability. This article shares how Douyu's algorithm team builds a comprehensive risk‑control framework to address these issues.

01. Intelligent Risk‑Control Background

Live‑streaming risk‑control problems include seven major categories: operational safety, activity safety, traffic safety, account safety, transaction safety, content safety, and risk joint control. Technical challenges are strong adversarial environments, numerous scenario‑specific models, and trade‑offs between robustness and interpretability.

To tackle these problems, we first root the solution in business needs and establish a generic algorithm architecture that integrates risk‑control strategies and operations, automating repetitive workflows and enabling intelligent handling of otherwise manual tasks.

02. Algorithm Architecture

1. Overview

We abstract the seven risk scenarios into four risk types:

Content risk (e.g., advertising, pornographic content in images, text, video).

User‑behavior risk (abnormal UID‑level actions such as excessive logins).

Gang risk (coordinated accounts that distribute malicious behavior across multiple IDs).

Device risk (risk identified from device‑level fingerprints).

These categories form the basis of a full‑scene intelligent risk‑control solution.

2. Core Algorithm Layer

Risk scoring – evolved from tree models to DeepFM, emphasizing ordered scores.

Gang identification – custom graph‑based algorithm replacing traditional graph methods.

Spam text detection – progressed from handcrafted features to TextCNN, then to a Wide&Deep model that fuses text and user behavior.

Device risk – Isolation Forest combined with a proprietary device‑fingerprint algorithm.

Abnormal sequence – sequence‑based detection to complement other risk signals.

3. Business Integration Layer

Risk scoring – base model provides daily scores; historical scores are weighted, and gang scores supplement missing data.

Gang identification – storage and relationship management for large‑scale gang data, with real‑time query support.

Spam text – model plus online anti‑adversarial strategies.

Device risk – custom device generation algorithm with similarity, anomaly, and risk‑score tags.

4. Risk‑Control System Layer

Includes unified interception services, real‑time gang services, device anomaly risk, scoring management, gang management, analysis platform, and sequence query.

5. Application Layer

Implements the seven risk‑control scenarios described earlier.

03. Model Practice

1. Practical Challenges

Combating highly variant spam text that evades models and policies.

Building a universal scoring system that quickly detects new risk behaviors.

Identifying weak risks at the UID level that are dispersed across many nodes.

Detecting abnormal behavior sequences.

Recognizing device‑level risks.

Providing interpretable results.

2. Algorithm Evolution

① Spam Text

Initial handcrafted features + shallow classifier.

Switch to TextCNN – reduced feature engineering, higher recall, but struggled with homophones and similar‑looking characters.

PyCNN with pinyin conversion and stroke‑based embeddings to handle homophones and similar characters.

Integration of user features using a Wide&Deep model inspired by recommendation systems.

② Risk Scoring

Goal: assign an ordered risk score to each user for immediate decision‑making. Early models (binary trees, logistic regression) lacked ordered scores or required heavy feature engineering. Iterations:

GBDT+LR – GBDT for automatic feature extraction, LR for ordered scores.

DNN replaces GBDT for higher‑order feature learning.

Wide side upgraded to FM, improving generalization.

Later stages added sequence and graph embeddings, achieving significant ROI gains.

③ Gang Detection

Characteristics: large scale, organized, use scripts and cloud‑controlled devices, exhibit synchronized behavior. Traditional graph algorithms fail due to scene‑specific modeling, lack of side‑information, and poor interpretability. Developed a full‑scene gang (FSG) mining algorithm.

④ Behavior Sequence

Initial C‑LSTM captured some anomalies but missed fine‑grained timing patterns. Added Timestamp embedding, then replaced CNN feature extractor with a Transformer encoder, filtering short sequences to reduce false positives.

⑤ Device Fingerprint

Challenges: defining similarity operators for diverse feature modules and fusing them into a comprehensive similarity score. The current architecture can detect device‑level anomalies.

⑥ Model Interpretability

Examples:

Risk scoring – GBDT+LR (refer to Alibaba's "Unpack Local Model Interpretation for GBDT").

DeepFM – mask‑based control variables to assess individual feature impact.

Thank you for listening.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

artificial-intelligence machine learning fraud detection Live Streaming risk control model architecture

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.