Posts

Showing posts from June, 2024

Transforming the landscape of ML using Transformers

Image
By now you must have heard about ''Transformers'' — not the movie franchise, but the machine learning model that forms part of the Chat-GPT acronym.  The GPT in Chat-GPT stands for Generative Pre-trained Transformer. This article is about transformers, how they revolutionized, not only the field of Natural Language Processing (NLP) but the whole machine learning landscape. One of the goals is to give you an intuition of what ''attention'' blocks in Transformers actually achieve. Through this I hope you also get an intuition for how technologies like Chat-GPT are stretching the boundaries of what AI can currently achieve. The Role of Auto-Encoders A key idea that has enabled this sudden rise of capability is a class of ML models called auto-encoders. The advantage they bring to the table is that, they are a kind of unsupervised ML technique. Meaning that they do not require each training sample to be associated with a ''label'' that then b...

The Integrated Gradients Technique for Model Interpretability

Image
In a previous article we saw why interpretability is important in machine learning and surveyed existing techniques. In this article we shall look into one specific technique called Integrated Gradients . Let's try to recall what problems such techniques address. Suppose we have a model that has been trained to classify images correctly into one among several classes. Now, given an input image, the model might correctly predict it's class or it might fail to. If it correctly predicts the class, we could ask which of the pixels or group of pixels contributed the most to the model's prediction. This is where techniques such as Integrated Gradients come to play. You can think of it as a technique to create a saliency map from a given input image. Applications of the IG technique Let's get a taste of the technique by looking at some cases where they have been successfully applied. The integrated gradients technique can be applied to various types of neural networks. We sho...