There are some papers in the literature that do not fit into an hierarchical clustering context. The systems reviewed here all define an algorithm or metric to determine the optimum number of speakers and a method for finding the optimum speaker clustering given a that number.
In Tsai and Wang (2006) a genetic algorithm is proposed to obtain an optimum speaker clustering that optimizes the overall model likelihood by initial random cluster assignment and iterative evaluation of the likelihood and mutation. In order to select the optimum amount of speakers they use BIC computed on the resulting models.
A relatively new learning technique called Variational Bayesian learning (VB) or Ensemble learning (Attias (2000), MacKay (1997)) is used in Valente and Wellekens (2004), Valente and Wellekens (2005) and Valente (2006) for speaker clustering. the VB training has the capacity of model parameter learning and model complexity selection, all in one algorithm. The models trained with this technique adapt their complexity to the amount of data available for training. In the proposed systems it computes the optimum clustering for a range of different number of clusters and uses a distance called free energy to determine the optimum.
In Lapidot (2003) self-organizing maps (SOM) (Lapidot et al. (2002), Kohonen (1990)) are proposed for speaker clustering given a known number of speakers. This is a VQ algorithm for training the code-books representing each of the speakers. An initial non-informative set of code-books is created and then SOM is iterated, retraining them until the number of acoustic frames switching between clusters is close to 0. In order to determine the optimum number of clusters a likelihood function is defined (derived from the code-words in the code-books by assigning a Gaussian pdf) and BIC is used.
user 2008-12-08