Clusters and Models Complexity Selection

This section describes two algorithms that are used to automatically determine the number of Gaussian mixtures per model and the number of initial clusters to be used in the system. In the baseline system these values were tuned using development data. This approach though was considered deficient as it assumes that both development and test data perform the same way. It was seen that the appropriate number of clusters and the complexity of each model at each stage are strongly dependent on the amount of data available, therefore any difference in the length of the data to be clustered between development and test was seen to harm the performance. Furthermore, each meeting contains a different amount of data after the speech/non-speech detection, which makes any defined parameters not tuned to the particular meeting's properties.

In order to determine the number of Gaussian mixtures and the number of initial clusters, the algorithms presented below base their selection on information on each particular recording rather than defining a pre-fixed value for all recordings of a certain type. In order to do this, a new parameter is defined which is called Cluster Complexity Ratio (CCR), and which defines a ratio between the amount of data being modeled and the mixtures needed to represent it. The CCR ratio is defined using development data, and it is used to define recording-specific values for the above mentioned parameters. Although the CCR still needs tuning, it allows for individual parameters to be determined for each show, adding robustness to the system.

Subsections

user 2008-12-08