Artificial Intelligence 13 min read

Scientific Data Definition, Application, Evaluation, and Explanation for Financial Risk Modeling

This presentation explores how to scientifically define, apply, evaluate, and interpret data in financial risk management, covering data alignment with business goals, feature selection, model metrics such as KS and PSI, handling pandemic effects, and methods for model explainability.

DataFunSummit
DataFunSummit
DataFunSummit
Scientific Data Definition, Application, Evaluation, and Explanation for Financial Risk Modeling

Data has become the energy and productivity of the information age, especially in finance where aligning data models with business objectives is crucial for effective risk management; the talk aims to discuss scientific data definition, application, evaluation, and interpretation within financial risk modeling.

Scientific Definition of Data : In credit risk, the common metric is annualized risk (annualized bad amount divided by annualized balance). Predicting this directly is difficult, so the focus shifts to forecasting overdue user distribution (e.g., MOB12). Overdue is defined by a time window (typically 30 days) after which a user is considered "bad," while "good" users are identified by longer non‑overdue periods, balancing observation window length against sample size.

Scientific Application of Data : Various data sources can be leveraged, including credit reports, internet data, third‑party fintech compliance data, and product‑usage behavior data. Users can be described from three perspectives: basic attribute profiles (age, gender, occupation, etc.), behavior sequences (captured via RNNs), and social relationships (modeled with GNNs). Simple feature‑design examples include text data with attention networks, time‑series data with RNNs, and relational data using clustering or graph convolution.

Scientific Evaluation of Data : Core ranking metric is KS, which measures the separation of good and bad users; however, KS can appear to decay when comparing different user sets over time. Model stability is assessed with PSI for score distribution and performance stability across score bands. Swap‑in/out analysis, rejection inference, and sample weighting are used to improve model robustness and address data drift, especially during events like the COVID‑19 pandemic.

Scientific Explanation of Data : Different model families offer trade‑offs between interpretability and performance: logistic regression (high interpretability, limited features), decision trees (strong performance, low interpretability), and two‑layer models that combine many sub‑models into a top‑level linear or shallow XGB model for both accuracy and explainability.

The session concludes with a summary of key takeaways and thanks to the audience.

Machine Learningfeature engineeringmodel evaluationdata sciencefinancial datarisk modeling
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.