The Integrated Gradients Technique for Model Interpretability

In a previous article we saw why interpretability is important in machine learning and surveyed existing techniques. In this article we shall look into one specific technique called Integrated Gradients. Let's try to recall what problems such techniques address. Suppose we have a model that has been trained to classify images correctly into one among several classes. Now, given an input image, the model might correctly predict it's class or it might fail to. If it correctly predicts the class, we could ask which of the pixels or group of pixels contributed the most to the model's prediction. This is where techniques such as Integrated Gradients come to play. You can think of it as a technique to create a saliency map from a given input image.

Applications of the IG technique

Let's get a taste of the technique by looking at some cases where they have been successfully applied. The integrated gradients technique can be applied to various types of neural networks. We show the application to two image models.

Object Recognition Network

An object recognition network that uses a GoogleNet architecture and trained using data from ImageNet challenge. A black image is used as a baseline and pixel importance is studied using the integrated gradients technique. The gradients are computed for the output of the highest scoring class with respect to the input pixels in each image. The results of the study are visualized in Figure 1. From these images, it can be easily seen that the integrated-gradients technique is able to bring out the fact that the model has given more importance to features that are able to better distinguish the particular class from others.

Figure 1. Figure shows (from left to right) the original image, label and softmax score for highest scoring class, visualization of the integrated gradients.

Diabetic Retinopathy Prediction

Diabetic Retinopathy is a condition that affects the eyesight in patients with diabetes. A deep neural network has been used to predict the severity grade of the disease using retinal fundus images. The model has achieved good results on various validation datasets. Trust in the networks predictions are important in areas such as health care and pixel attribution techniques such as integrated gradients can help towards the goal. Figure 2 shows a visualization of integrated gradients applied to the retinal fundus images. The aggregated gradients are overlayed on a grayscale version of the image with positive attribution in the green color channel and negative attributions in the red color channel. The resulting visualization shows a localization of the integrated gradient towards pixels that seem to be actual lesions in the retina. Also the periphery of the lesion receives positive attribution whereas the interior of the parts receive negative attribution. Showing that the network focuses on the boundary of the lesions.

Figure 2. Figure shows the original retinal fundus image (left) and the attributions from Integrated Gradients overlayed on the original image in gray scale (right). Lesions visible to an expert are annotated on the original image and are also picked correctly by the attributions.

Motivation for Integrated-Gradients

Need for a baseline

It is common to reflect back on a life choice that you made and try to imagine how different your life would have ended up, had your choice been a different one. Such thinking is what is called as counterfactual thinking. It is even said that the ability to think in counterfactual might be a uniquely human trait. When we talk about the importance of a feature or pixel, we have in mind a situation where that feature or pixel is absent and which might lead to a better or worse off situation. Similarly, a baseline image is useful when one needs to bring in the concept of missingness of a particular feature. Note that when your input is an image, the features are the individual pixel values in the image. If you remove a pixel from an image it might lead to several difficulties like changing the shape of your data. Instead if you replace the actual pixel value with zero, you get a black pixel. Then a pure black image can be considered as a baseline image for all the individual pixels in your input. But surely such a black image is one among several baselines that can be considered. The impact of different baselines other than a pure black image on attribution methods such as the integrated gradients are covered in this article .

Why not just use gradients?

Since the gradient of a function represents the direction of maximum increase of the function. The gradients directly tell us which pixels have the greatest effect on the output. This was why gradients were often used as a primitive method for such visualizations. However for neural networks the gradients of input pixels may have small magnitudes around a sample even if the network depends heavily on those pixels. This is called saturation and can happen when the output function flattens after a certain magnitude is reached for each pixel. Imagine an image that is interpolated from a pure black image to the sample image. Intuitively this makes sense, the pixels values immediately close to a given pixel should not change the output function significantly. Additionally the authors of the main paper argue that gradients break the axiom of sensitivity that is required for any good attribution technique.

Inspiration from Game Theory

The inspiration for the Integrated Gradients technique comes from Cooperative Game Theory, especially the Aumann-Shapley cost-sharing method. In games we have sets of participants which may be called a coalition. The value of a participant or group of participant is computed by understanding how much the value of the game is increased when the group of participants are added to any given coalition.

Evaluation of Attribution Techniques

How can we evaluate an attribution technique itself, such as the Integrated Gradients? The authors of the Integrated Gradients paper mention that every such method existing in literature had an issue. It could not distinguish between artefacts that come from perturbing the data, a poor model and a poor attribution technique. This was one of the motivations for them to come up with an axiomatic way to define a good attribution method.

Integrated Gradients

We can consider many paths of interpolating between a baseline and the given input image. Integrated gradients aggregate gradients along the images that fall on the straight line between the baseline and the input.

The definition is shown in the equation below.

Here $\alpha$ is the interpolation coefficient between 0 and 1 which is used to interpolate between the baseline and the given input image. $x_i$ is the given input image and $x_i^{'}$ is the baseline image.

There is still a lot of progress to be made in this area of path attribution methods and interpretablility methods in general. Which leaves a lot of opportunities for young researchers. This is specifically useful in research where the end goal is not just to develop an AI model that can make predictions but also help in filling the gaps in theoretical understanding of various phenomenon.

References:

Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic Attribution for Deep Networks (arXiv:1703.01365). arXiv. http://arxiv.org/abs/1703.01365
Sundararajan, M., & Taly, A. (2018). A Note about: Local Explanation Methods for Deep Neural Networks lack Sensitivity to Parameter Values (arXiv:1806.04205). arXiv. http://arxiv.org/abs/1806.04205
Aumann, R. J., & Shapley, L. S. (1974). Values of non-atomic games. Princeton University Press.
https://distill.pub/2020/attribution-baselines/

Search This Blog

Musings of a Machine Learning Maniac