AI Glossary: What Is Empirical Risk (ER)? Definition & Meaning

Empirisches Risiko ist ein zentrales Konzept in maschinellem Lernen and statistics that refers to the average error or loss of a predictive model when evaluated on a specific set of Trainingsdaten. It is calculated by taking the sum of the losses incurred by the model’s predictions compared to the actual outcomes from the training data and dividing it by the number of observations in that Datensatz.

In mathematical terms, if we have a model that makes predictions based on input features, we can denote the Verlustfunktion as L(y, ŷ), where y represents the actual outcome and ŷ is the predicted outcome. The empirical risk (R_emp) can be expressed as:

R_emp = (1/n) * Σ L(y_i, ŷ_i)

Here, n is the number of samples in the training set, and the summation is over all training samples i. The goal in training a model is to minimize this empirical risk, which is often referred to as training the model to reduce its Fehler auf den Trainingsdaten.

However, it is important to note that minimizing empirical risk alone does not guarantee good performance on unseen data (generalization). This is because a model that performs very well on training data may overfit, capturing noise rather than the underlying distribution of the data. To mitigate this risk, techniques such as cross-validation, regularization, and the use of separate validation sets are employed to ensure that the model not only learns the training data but also generalizes well to new, unseen data.