Artificial Intelligence 14 min read

Understanding Neural Network Predictions with Integrated Gradients

This article introduces the Integrated Gradients (IG) method for explaining deep neural networks, compares it with saliency maps and Shapley‑based approaches, discusses its axiomatic foundations, and provides a step‑by‑step guide to implementing IG using the open‑source TruLens library, including custom baselines and attribution measures.

Code DAO

Dec 24, 2021

Understanding Neural Network Predictions with Integrated Gradients

Deep neural networks have achieved remarkable success in vision, natural language processing, and time‑series tasks, yet they are often treated as black‑box models because of their opacity.

The article presents Integrated Gradients (IG), originally proposed by Sundararajan, Taly, and Yan, as an attribution technique that satisfies two key axioms—completeness and sensitivity—by integrating gradients along a straight‑line path from a baseline input x' to the actual input x. Completeness ensures that the sum of attributions equals the difference in the model’s output between x and x', while sensitivity guarantees non‑zero attribution for features that affect the prediction.

IG’s scope covers any differentiable function of the network, typically the scalar output of a classifier. The method works globally and locally, and it can be applied to inputs in raw pixel space, embedding space for NLP, or any other differentiable representation.

Compared with saliency maps, which only reflect gradients at the input point and can be misleading in saturated regions, IG provides a more faithful representation because it aggregates gradients along the entire path. The relationship between IG and saliency maps is analogous to that between LIME and Shapley values.

When contrasted with Shapley‑based methods such as QII, IG and Shapley share similar axiomatic goals but differ in sampling strategy. Shapley values require factorial‑scale sampling and are computationally intensive for high‑dimensional inputs, whereas IG samples a fixed number of points along a linear path, typically 10–20 partitions, and can often achieve sufficient accuracy with fewer than 100 points.

The article then demonstrates a practical implementation using the open‑source TruLens library. After installing TruLens, the IntegratedGradients class is instantiated with a model and a resolution parameter, and attributions are computed and visualized as a mask over the original image:

from trulens.nn.attribution import IntegratedGradients
from trulens.visualizations import MaskVisualizer
ig_computer = IntegratedGradients(model, resolution=10)
input_attributions = ig_computer.attributions(beagle_bike_input)
visualizer = MaskVisualizer(blur=10, threshold=0.95)
visualization = visualizer(input_attributions, beagle_bike_input)

TruLens also introduces the concept of a Distribution of Interest (DoI) to define which records are averaged when computing attributions. A custom DoI can be created by subclassing LinearDoi and overriding the call method and get_activation_multiplier to control how baselines are interpolated.

class LinearDoi(DoI):
    def __call__(self, z: ArrayLike) -> List[ArrayLike]:
        # linear interpolation between baseline and input
        ...
    def get_activation_multiplier(self, activation: ArrayLike) -> ArrayLike:
        return activation if self._baseline is None else activation - self._baseline

Custom output functions (QoI) can also be defined to explain specific layers or class differences. By supplying a custom QoI to TruLens’s InternalInfluence measure, users can target logits, probabilities, or any scalar derived from the network:

from trulens.nn.attribution import InternalInfluence
from trulens.nn.quantities import MaxClassQoI
infl = InternalInfluence(model, Slice(InputCut(), Cut('logits')), MaxClassQoI(), LinearDoi())

Baseline selection is critical. While a black image is a common baseline for vision models, it can obscure attributions for inherently dark objects (e.g., penguins) or introduce spurious signals when a watermark is present in most training images. The article recommends choosing baselines that reflect the data distribution, such as the mean image or a semantically meaningful reference, to avoid violating the sensitivity axiom.

Finally, the article lists key references that introduced IG and related attribution methods, providing readers with sources for deeper theoretical background.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Deep Learning neural networks Model Explainability TruLens Attribution Methods Integrated Gradients

Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.