Artificial Intelligence 14 min read

Scientific Data Definition, Application, Evaluation, and Explanation in Financial Risk Modeling

This presentation explores how to scientifically define, apply, evaluate, and interpret data in financial risk management, covering data alignment with business goals, feature selection, model metrics like KS and PSI, handling pandemic impacts, and methods for model explanation and improvement.

DataFunTalk

Apr 25, 2022

Scientific Data Definition, Application, Evaluation, and Explanation in Financial Risk Modeling

Data is the new energy and productivity in the information age, but leveraging massive, complex data—especially in finance—requires scientific approaches to align data with business objectives, select appropriate methods, evaluate model performance, and interpret results.

01 Scientific Definition of Data

1. Financial risk management: Credit business transforms savings into investment, similar to e‑commerce recommendation or ad targeting, aiming for precise risk matching between fund providers and borrowers.

2. Scientific definition of data: Annualized risk is defined as annualized bad amount divided by annualized balance; predicting annualized risk directly is difficult, so predicting overdue user distribution (MOB12) is more practical.

3. Relating model predictions to annualized risk: The ratio of annualized risk to overdue rate (MOB12) near 1 indicates balanced credit limits; deviations suggest over‑ or under‑allocation.

4. Defining overdue and good users: Overdue status varies over time; a 30‑day overdue threshold (N=30) is commonly used to label bad users, while longer observation windows affect sample size and relevance.

5. Determining observation windows: Using vintage curves to find a point where slope approaches zero; typically MOB=12 is chosen for medium‑term risk observation.

02 Scientific Application of Data

Data types usable in financial risk models include:

Credit reports: Historical credit records.

Internet data: Various online user data.

Third‑party fintech compliance data.

Behavior data from the product itself.

User perspectives are described as:

Basic attribute portrait: Age, gender, occupation, interests, etc., derived via ML/NLP.

Behavior sequence: Time‑ordered actions, modeled with RNNs.

Social relationships: Peer income/consumption, modeled with GNNs.

Simple model and feature examples (not detailed):

Text data: Attention networks extract key information.

Sequential data: RNNs predict future risk from repayment behavior.

Relational data: Clustering and graph convolutional networks leverage neighbor information.

03 Scientific Evaluation of Data

Key model evaluation metrics:

KS (Kolmogorov‑Smirnov) statistic: Measures ranking ability of good vs. bad users; offline KS may be high but can decay online due to differing user sets.

PSI (Population Stability Index): Assesses distribution stability of predicted scores over time.

Swap‑in & swap‑out analysis: Compares overall overdue rates and approval rates between old and new models under equal volume conditions.

Model stability is crucial; stable score‑to‑risk mapping (e.g., 600‑650 score ≈ 1% overdue) should hold across months.

Reject inference: Assign scores to rejected users (e.g., replicate samples with weighted labels) to enrich training data and improve model applicability.

Customer segmentation: Hierarchical grouping by loan purpose, activity level, and industry/behavior to build specialized models when distinct differences exist.

04 Scientific Explanation of Data

Model explanation approaches:

V1 – Logistic Regression: Highly interpretable but limited feature capacity.

V2 – Decision Tree: Handles many features and non‑linearities but harder to interpret.

V3 – Two‑layer model: Sub‑models built from thousands of variables feed into a top‑level LR or shallow XGB, offering good top‑level interpretability while leveraging complex features.

The session concludes with thanks to the audience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning model evaluation data science PSI financial data risk modeling KS metric

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.