AI Glossary: What Is Integrated Gradients (IG)? Definition & Meaning

Integrierte Gradienten ist eine Technik, die im Bereich der künstlichen Intelligenz verwendet wird, particularly in neuronale Netze, to provide insights into how individual input features contribute to model predictions. This method is especially useful for enhancing the interpretability of Deep Learning models, which are often viewed as ‘black boxes’ due to their complex architectures.

The core idea behind Integrated Gradients is to compute the gradients of the model’s output with respect to its input features, integrating these gradients along a path from a baseline input (usually a zeroed or neutral input) to the actual input. This integration captures the cumulative effect of each feature as the model transitions from the baseline to the input of interest.

The process can be summarized in a few steps: first, define a baseline input that represents a neutral or reference state; second, compute the gradients of the model’s output with respect to the input; and finally, integrate these gradients along a straight-line path from the baseline to the actual input. The resulting values indicate the importance of each feature in making the prediction. This method helps in understanding which features are driving the model’s decisions, making it easier to validate and trust KI-Systemen.

Insgesamt trägt integrierte Gradienten nicht nur zur Modellinterpretierbarkeit but also contributes to the broader goal of fair and accountable AI, as it allows stakeholders to scrutinize and understand the decision-making processes of AI systems.