K-Medoids
K-Medoidsは一種の clustering algorithm that is used to partition a dataset into groups, or clusters, based on similarity. Unlike K-means, which uses centroids (the mean of the points in a cluster) to represent clusters, K-Medoids selects actual data points as the centers of these clusters, known as medoids. This approach makes K-Medoids more robust to noise and outliers in the data.
このアルゴリズムは、いくつかの重要なステップで構成されています:
- 初期化: Choose ‘k’ initial medoids randomly from the dataset.
- 割り当て: 割り当て:各データポイントを最も近いメドイドに割り当て、クラスターを作成します。
- 更新: For each cluster, find the data point that minimizes the total dissimilarity (often measured using distance metrics like Manhattan or ユークリッド距離) すべての他の点に対して。 この点が新しいメドイドになります。
- 繰り返し: Repeat the assignment and update steps until the medoids no longer change or a specified number of iterations is reached.
K-Medoids is particularly useful in scenarios where the dataset is small to medium-sized and when the presence of outliers could skew results. It is widely applied in various fields, including marketing for customer segmentation, biology for species classification, and 画像処理 パターン認識のために。
Overall, K-Medoids provides a more stable clustering option compared to K-Means, especially in datasets where outliers are present, as it relies on actual data points rather than calculated averages.