階層的凝集クラスタリング
階層的 アグロメレーティブ・クラスタリング (HAC) is a popular method in the field of クラスタ分析 that aims to group data points into hierarchical structures. It operates on the principle of starting with individual data points and progressively merging them into larger clusters. This process continues until all data points are part of a single cluster or until a specified number of clusters is achieved.
The algorithm works as follows: initially, each data point is considered a separate cluster. The closest two clusters are identified based on a 距離尺度 (such as ユークリッド距離), and they are merged to form a new cluster. This merging process is repeated iteratively, and at each step, the algorithm recalculates the distances between the newly formed cluster and the remaining clusters, allowing for a dynamic adjustment of the cluster structure.
HAC can be visualized using a dendrogram, which is a tree-like diagram that illustrates the arrangement of clusters and their relationships. The height of the branches in the dendrogram represents the distance or dissimilarity between the merged clusters. This visualization helps in deciding the optimal number of clusters by setting a threshold distance at which to cut the dendrogram.
There are different linkage criteria used in HAC, including single-linkage (minimum distance), complete-linkage (maximum distance), and average-linkage (mean distance), each affecting the shape and size of the resulting clusters. HAC is particularly useful for 探索的データ分析, as it does not require a predetermined number of clusters and can reveal the underlying structure of the data.