AI Glossary: What Is Gini Impurity? Definition & Meaning

Gini Impurity is a statistical measure that quantifies the impurity or disorder in a dataset. It is commonly utilizado en aprendizaje automático, particularly in the construction of decision trees, to determine how well a split separates classes in classification tareas. La Impureza de Gini se calcula usando la fórmula:

Gini = 1 – ∑(p_i)²

where p_i represents the proportion of instances belonging to class i. The value of Gini Impurity ranges from 0 to 1, where:

0 indica un conjunto de datos perfectamente puro (todas las instancias pertenecen a una sola clase), y
1 indica máxima impureza (las instancias están distribuidas uniformemente entre las clases).

In practice, Gini Impurity is calculated for each possible split in the dataset. The split that results in the lowest Gini Impurity is chosen, as it implies that the resulting child nodes are more homogeneous compared to the nodo padre. This measure is favored for its eficiencia computacional and its ability to encourage diversity among the classes in the resulting splits.

En general, la Impureza de Gini es un concepto esencial en algoritmos de árboles de decisión, contributing to the model’s ability to classify data effectively and accurately.