AI Glossary: What Is Gini Impurity? Definition & Meaning

Gini Impurity is a statistical measure that quantifies the impurity or disorder in a dataset. It is commonly utilisé en apprentissage automatique, particularly in the construction of decision trees, to determine how well a split separates classes in classification tâches. L'impureté de Gini est calculée à l'aide de la formule :

Gini = 1 – ∑(p_i)²

where p_i represents the proportion of instances belonging to class i. The value of Gini Impurity ranges from 0 to 1, where:

0 indique un ensemble de données parfaitement pur (toutes les instances appartiennent à une seule classe), et
1 indique une impurité maximale (les instances sont réparties uniformément entre les classes).

In practice, Gini Impurity is calculated for each possible split in the dataset. The split that results in the lowest Gini Impurity is chosen, as it implies that the resulting child nodes are more homogeneous compared to the nœud parent. This measure is favored for its l'efficacité computationnelle and its ability to encourage diversity among the classes in the resulting splits.

Dans l'ensemble, l'impureté de Gini est un concept essentiel dans les algorithmes d'arbres de décision, contributing to the model’s ability to classify data effectively and accurately.