AI Glossary: What Is Information Gain (IG)? Definition & Meaning

Informationsgewinn is a key Begriff in der Informationstheorie and maschinellem Lernen that quantifies the effectiveness of an attribute in classifying data. Specifically, it measures the reduction in entropy, or uncertainty, associated with a random variable when additional information is introduced.

Entropy, represented as H(X), is a measure of the unpredictability or disorder of a system. When we have a dataset with a target variable (e.g., whether an email is spam or not), the initial entropy reflects our uncertainty about the classification of that variable. By introducing a feature or attribute (such as the presence of certain words in the email), we can partition the dataset into subsets that provide more information about the target variable.

Die Formel für den Informationsgewinn (IG) lautet:

IG(X, Y) = H(X) – H(X|Y)

Wo:

H(X) ist die Entropie des ursprünglichen Datensatzes.
H(X|Y) ist die bedingte Entropie des Datensatzes gegeben das Attribut Y.

Einfach ausgedrückt sagt uns der Informationsgewinn, wie sehr das Wissen um den Wert des Attributs Y die Unsicherheit bei der Vorhersage von X reduziert. Ein hoher Informationsgewinn zeigt an, dass das Attribut effektiv darin ist, die Daten in Gruppen zu unterteilen, die hinsichtlich der Zielvariable homogener sind.

Dieses Konzept wird häufig in Entscheidungsbaum-Algorithmen verwendet, such as ID3 (Iterative Dichotomiser 3), where nodes are chosen based on the attribute that provides the highest Information Gain, thus leading to better predictive performance.

Zusammenfassend ist der Informationsgewinn eine grundlegende Messgröße in Datenwissenschaft that helps us identify which features or attributes are most informative for predicting outcomes.