AI Glossary: What Is Gini Impurity? Definition & Meaning

Gini Impurity is a statistical measure that quantifies the impurity or disorder in a dataset. It is commonly im maschinellen Lernen, particularly in the construction of decision trees, to determine how well a split separates classes in classification Aufgaben. Die Gini-Impurität wird mit der Formel berechnet:

Gini = 1 – ∑(p_i)²

where p_i represents the proportion of instances belonging to class i. The value of Gini Impurity ranges from 0 to 1, where:

0 zeigt einen perfekt reinen Datensatz an (alle Instanzen gehören zu einer einzigen Klasse), und
1 zeigt maximale Unreinheit an (Instanzen sind gleichmäßig auf die Klassen verteilt).

In practice, Gini Impurity is calculated for each possible split in the dataset. The split that results in the lowest Gini Impurity is chosen, as it implies that the resulting child nodes are more homogeneous compared to the Elternknoten. This measure is favored for its Rechenleistungseffizienz and its ability to encourage diversity among the classes in the resulting splits.

Insgesamt ist die Gini-Impurität ein wesentliches Konzept in Entscheidungsbaum-Algorithmen verwendet, contributing to the model’s ability to classify data effectively and accurately.