Distribución de Dirichlet
La distribución de Dirichlet es una familia de continuas distribuciones de probabilidad defined over a simplex, which is a generalization of a triangle or tetrahedron in dimensiones superiores. It is primarily used in statistics and aprendizaje automático to model the distribution of proportions among multiple categories. Each category’s probability is represented as a component of a vector, and the Dirichlet distribution ensures that the sum of these probabilities equals one.
Formally, the Dirichlet distribution is parameterized by a vector of positive real numbers, often denoted as α = (α₁, α₂, ..., αₖ), where k is the number of categories. Each αᵢ serves as a concentration parameter that influences the distribution’s shape. A larger αᵢ indicates that the corresponding category is more likely to have a higher proportion, while smaller values suggest a lower likelihood.
The probability density function (PDF) of the Dirichlet distribution for a vector x = (x₁, x₂, ..., xₖ) is given by:
f(x; α) = (1/B(α)) * Π (xᵢ^(αᵢ - 1))
where B(α) is the multivariate Beta function, which normalizes the distribution, ensuring that the total probability is equal to one.
La distribución de Dirichlet se usa ampliamente en estadística bayesiana, particularly as a prior distribution for multinomial distributions. It is also essential in various applications, including procesamiento de lenguaje natural, genetics, and any area where proportions of different categories need to be modeled.