Distribuição de Dirichlet
A distribuição de Dirichlet é uma família de distribuições contínuas distribuições de probabilidade defined over a simplex, which is a generalization of a triangle or tetrahedron in dimensões superiores. It is primarily used in statistics and aprendizado de máquina to model the distribution of proportions among multiple categories. Each category’s probability is represented as a component of a vector, and the Dirichlet distribution ensures that the sum of these probabilities equals one.
Formally, the Dirichlet distribution is parameterized by a vector of positive real numbers, often denoted as α = (α₁, α₂, ..., αₖ), where k is the number of categories. Each αᵢ serves as a concentration parameter that influences the distribution’s shape. A larger αᵢ indicates that the corresponding category is more likely to have a higher proportion, while smaller values suggest a lower likelihood.
The probability density function (PDF) of the Dirichlet distribution for a vector x = (x₁, x₂, ..., xₖ) is given by:
f(x; α) = (1/B(α)) * Π (xᵢ^(αᵢ - 1))
where B(α) is the multivariate Beta function, which normalizes the distribution, ensuring that the total probability is equal to one.
A distribuição de Dirichlet é amplamente utilizada em estatísticas bayesianas, particularly as a prior distribution for multinomial distributions. It is also essential in various applications, including processamento de linguagem natural, genetics, and any area where proportions of different categories need to be modeled.