Dirichlet-Verteilung
Die Dirichlet-Verteilung ist eine Familie kontinuierlicher Wahrscheinlichkeitsverteilungen defined over a simplex, which is a generalization of a triangle or tetrahedron in höhere Dimensionen. It is primarily used in statistics and maschinellem Lernen to model the distribution of proportions among multiple categories. Each category’s probability is represented as a component of a vector, and the Dirichlet distribution ensures that the sum of these probabilities equals one.
Formally, the Dirichlet distribution is parameterized by a vector of positive real numbers, often denoted as α = (α₁, α₂, ..., αₖ), where k is the number of categories. Each αᵢ serves as a concentration parameter that influences the distribution’s shape. A larger αᵢ indicates that the corresponding category is more likely to have a higher proportion, while smaller values suggest a lower likelihood.
The probability density function (PDF) of the Dirichlet distribution for a vector x = (x₁, x₂, ..., xₖ) is given by:
f(x; α) = (1/B(α)) * Π (xᵢ^(αᵢ - 1))
where B(α) is the multivariate Beta function, which normalizes the distribution, ensuring that the total probability is equal to one.
Die Dirichlet-Verteilung wird häufig in Bayesianischer Statistik, particularly as a prior distribution for multinomial distributions. It is also essential in various applications, including der Verarbeitung natürlicher Sprache, genetics, and any area where proportions of different categories need to be modeled.