Log Loss, also known as logistic loss or cross-entropy loss, is a performance metric used primarily in binary classification problems where the predicted output is a probability value ranging between 0 and 1. It quantifies the difference between the predicted probabilities and the actual class labels (0 or 1). The objective of using log loss is to evaluate how well a classification model predicts probabilities for binary outcomes, with lower values indicating better model performance.
Mathematically, Log Loss is calculated using the following formula:
Log Loss = -1/N * Σ [y * log(p) + (1 – y) * log(1 – p)]
Where:
- N is the total number of predictions.
- y is the actual label (0 or 1).
- p is the predicted probability of the positive class (1).
The Log Loss value ranges from 0 to infinity, where 0 indicates perfect predictions (model outputs probabilities of either 0 or 1 correctly) and larger values indicate worse performance. A model that predicts probabilities close to the true labels will have a lower log loss, while a model that predicts probabilities far from the true labels will incur a higher log loss.
Log Loss is particularly useful in scenarios where the output is not just a hard classification but a probability, making it suitable for applications such as logistic regression, neural networks, and other probabilistic classifiers. It is also widely used in machine learning competitions, such as those hosted by Kaggle, to assess model performance.