Artificial Intelligence 14 min read

Financial Big Data Risk Control Models: Techniques, Applications, and COVID‑19 Challenges

This article presents a comprehensive overview of financial big‑data risk control models at Du Xiaoman, covering traditional scoring cards, AI‑driven time‑series and text processing, graph‑based networks, model interpretability, probability calibration, stability analysis, and the specific challenges introduced by the COVID‑19 pandemic.

DataFunTalk
DataFunTalk
DataFunTalk
Financial Big Data Risk Control Models: Techniques, Applications, and COVID‑19 Challenges

Guest: Yan Cheng, Head of Risk Modeling at Du Xiaoman Financial

Editor: Huang Leping

Introduction

Financial AI is a key avenue for traditional industry transformation. This session focuses on the technical methods and practical issues of big‑data risk control models at Du Xiaoman, with a discussion of model evolution under the COVID‑19 background.

1. FinTech in Risk Management

FinTech in risk management consists of two parts:

Traditional financial scoring cards: Application Scorecard (A‑card), Behavior Scorecard (B‑card), Collection Scorecard (C‑card).

Information technology capabilities: AI (algorithmic power), Big Data (digital behavior storage), Cloud (resource sharing).

These technologies enhance the effectiveness of traditional scoring‑card models.

2. Du Xiaoman Credit Risk

Du Xiaoman has accumulated extensive data and modeling experience. Core risk identification relies on three layers:

Base user profile (age, gender, education, income, assets, credit history).

Behavioral demand patterns (recent financial actions correlated with past behavior).

Social activity networks (detecting fraud rings and peer influence).

Combining these layers builds a discriminative risk model.

3. Time‑Series Processing: Pre‑Loan

Credit applications obtain credit reports; analyzing the temporal sequence of user actions (e.g., loan queries, disbursements) reveals cash‑flow needs. A deep neural network (LSTM) ingests items composed of timestamp, action type, and features, learning richer representations and improving KS by ~2 points.

4. Time‑Series Processing: In‑Loan

In‑loan behavior feeds B‑card modeling. For each transaction slice, features such as total limit, remaining principal, action type, amount, days to next repayment, etc., are generated and fed into RNNs, significantly boosting B‑card performance.

5. Text Data Processing

Unstructured text from internet behavior is handled via an attention‑based framework that scores each information unit independently of order, allowing flexible integration of new data and improving model robustness.

6. Graph Networks

Graph‑based methods are applied by constructing dense neighbor networks (1‑, 2‑, 3‑hop) and performing graph convolution using user features and neighbor information, followed by supervised learning, enhancing risk identification when combined with other models.

7. Application‑Layer Issues

Model Interpretability

Simple functional form (e.g., logistic regression).

Strong correlation between input features X and prediction Y.

Limited number of variables (≤20).

Complex models (e.g., XGBoost) are decomposed into sub‑models; each sub‑model’s score is combined via logistic regression or a simple decision tree, preserving interpretability and easing monitoring.

Probability Calibration

Calibration steps: segment predictions, compute logit of true delinquency rates per segment, compute average logit of predictions, fit a curve (linear or quadratic), and transform to a credit score (e.g., FICO‑style), making the model independent of sample bad‑rate.

Score Stability

Stability includes distribution stability (monthly score distribution), performance stability (monthly bad‑rate per score), and individual score volatility (sensitive to recent borrowing/repayment behavior).

8. COVID‑19 Impact on Models

COVID‑19 serves as a stress test: while feature X remains unchanged, the associated risk Y rises, especially for multi‑loan variables. Challenges include capturing macro‑economic signals, deciding whether to include pandemic‑era samples in training, and adjusting strategies based on short‑ or long‑term pandemic effects.

Q&A

Q1: Which features reflect macro‑economic conditions under the pandemic? A: Re‑employment indices derived from location migration correlate strongly with user income.

Q2: How does high‑dimensional feature inclusion compare with separate sub‑model scoring? A: KS difference is ~0.5%; high‑dimensional models have more parameters, harder monitoring, and lower interpretability.

Thank you for attending.

Artificial IntelligenceBig Datatime seriesmodel interpretabilityfinancial technologycredit scoringrisk modeling
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.