Hierarchical Clustering Techniques

Figure 2.1: Graphic interpretation of the most common clustering techniques

Most of the reviewed offline clustering algorithms use hierarchical schemes, where speech segments or clusters are iteratively split or merged until the optimum number of speakers is reached. In figure 2.1 a pedantic abstraction of the two mostly used techniques in speaker clustering is shown. Bottom-up clustering systems are those which start with a big number of segments/clusters and via merging techniques converge to the optimum amount of clusters. On the other hand, top-down systems usually start with one or very few clusters and work its way up (in the number of clusters, down in the figure) via splitting procedures to obtain the optimum amount. In the design of either system, two items need to be defined:

  1. A distance between clusters/segments to determine acoustic similarity. Instead of defining an individual value pair, usually a distance matrix is described, which is created with the distance from any possible pair. In many cases the distance metrics used for speaker clustering resemble those used in speaker segmentation computed using the metrics presented in section 2.2.

  2. A stopping criterion to stop the iterative merging/splitting at the optimum number of clusters (which might be different depending on the application).

Classified by the type of clustering, the following are the most representative techniques described in the literature:

user 2008-12-08