K-Medoid
K-Medoids é um tipo de clustering algorithm that is used to partition a dataset into groups, or clusters, based on similarity. Unlike K-Médias, which uses centroids (the mean of the points in a cluster) to represent clusters, K-Medoids selects actual data points as the centers of these clusters, known as medoids. This approach makes K-Medoids more robust to noise and outliers in the data.
O algoritmo opera em algumas etapas principais:
- Inicialização: Choose ‘k’ initial medoids randomly from the dataset.
- Atribuição: Atribua cada ponto de dado ao medoid mais próximo, formando clusters.
- Atualização: For each cluster, find the data point that minimizes the total dissimilarity (often measured using distance metrics like Manhattan or distância Euclidiana) para todos os outros pontos no cluster. Este ponto se torna o novo medoid.
- Repetir: Repeat the assignment and update steps until the medoids no longer change or a specified number of iterations is reached.
K-Medoids is particularly useful in scenarios where the dataset is small to medium-sized and when the presence of outliers could skew results. It is widely applied in various fields, including marketing for customer segmentation, biology for species classification, and processamento de imagens para reconhecimento de padrões.
Overall, K-Medoids provides a more stable clustering option compared to K-Means, especially in datasets where outliers are present, as it relies on actual data points rather than calculated averages.