Artificial Intelligence 12 min read

Unpack Local Model Interpretation for GBDT – Summary and Analysis

This article summarizes the Ant Financial paper presented at DASFAA 2018 that proposes a universal local explanation method for Gradient Boosting Decision Tree models, detailing the problem definition, the PMML‑based algorithm for attributing feature contributions, experimental validation on fraud detection data, and the practical benefits for model transparency and improvement.

AntTech
AntTech
AntTech
Unpack Local Model Interpretation for GBDT – Summary and Analysis

On May 21, 2018, the DASFAA 2018 conference was held in Gold Coast, Australia, where Ant Financial presented a paper titled "Unpack Local Model Interpretation for GBDT".

GBDT (Gradient Boosting Decision Tree), also known as MART, is an iterative ensemble of decision trees whose leaf scores are summed to produce the final prediction. It has strong generalization ability and is widely used in search ranking.

The paper addresses the growing demand for interpreting GBDT predictions by proposing a unified local explanation method that attributes a contribution score to each feature for a given sample.

Problem description: Global interpretation measures overall feature importance, while local interpretation quantifies each feature’s contribution to a single prediction. Unlike linear models, GBDT’s feature contributions are embedded in tree structures, requiring a method to decompose leaf scores back to individual features.

Method: The approach extracts the PMML representation of a GBDT model, records each node’s split feature, threshold, leaf score, and sample count. For a sample, the path from root to leaf is traced; the contribution of a split feature is defined as the difference between the child node’s score and its parent’s score. Contributions are weighted by the number of training samples reaching each node to obtain a more accurate parent‑node estimate.

After offline preprocessing, the contribution values for all trees are summed during prediction, yielding a per‑sample feature importance without adding significant latency.

Experiments: Using Ant’s distributed GBDT implementation SMART and JPMML for inference, the method was evaluated on a sampled Alipay transfer fraud dataset. Consistency between local and global explanations was verified (Figure 4). The local GBDT explanations were compared with Random Forest (RF) local explanations using Information Value (IV) as a benchmark; GBDT showed superior coverage (Figure 5), especially after applying sample‑size weighted averaging (GBDTV2).

Manual case studies confirmed that high‑risk features identified by the local explanations matched expert judgments and uncovered sample‑specific risky features that were not prominent globally.

Conclusion: The proposed universal local interpretation framework for GBDT models requires only a one‑time offline preprocessing step and provides real‑time, per‑sample explanations. Experiments demonstrate its reliability and practical value, making it a useful “model translator” for both model validation and improvement.

GBDTMachine Learningfeature importancePMMLlocal explanationmodel interpretation
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.