Hierarchisches agglomeratives Clustering
Hierarchisch Agglomeratives Clustering (HAC) is a popular method in the field of Clusteranalyse that aims to group data points into hierarchical structures. It operates on the principle of starting with individual data points and progressively merging them into larger clusters. This process continues until all data points are part of a single cluster or until a specified number of clusters is achieved.
The algorithm works as follows: initially, each data point is considered a separate cluster. The closest two clusters are identified based on a Distanzmetrik (such as euklidische Distanz), and they are merged to form a new cluster. This merging process is repeated iteratively, and at each step, the algorithm recalculates the distances between the newly formed cluster and the remaining clusters, allowing for a dynamic adjustment of the cluster structure.
HAC can be visualized using a dendrogram, which is a tree-like diagram that illustrates the arrangement of clusters and their relationships. The height of the branches in the dendrogram represents the distance or dissimilarity between the merged clusters. This visualization helps in deciding the optimal number of clusters by setting a threshold distance at which to cut the dendrogram.
There are different linkage criteria used in HAC, including single-linkage (minimum distance), complete-linkage (maximum distance), and average-linkage (mean distance), each affecting the shape and size of the resulting clusters. HAC is particularly useful for explorative Datenanalyse, as it does not require a predetermined number of clusters and can reveal the underlying structure of the data.