Introduced in the system presented for the RT04f evaluation, after each cluster pair merge a set of three iterations of models training and Viterbi segmentation of the data given the models is performed. This achieved a small improvement on the RT04f evaluation data but proved to be positive and to increase robustness in the system.
After any cluster pair merges the cluster structure changes, with one less cluster in the system. When performing a Viterbi segmentation many segment boundaries change and some segments are reassigned to different clusters. Such new clusters are used to retain the models which are used again to segment the data. After three iterations the segmentation has usually converged and a new merging step is started.
In order to stop the clustering processing at the optimum number of clusters, two different alternatives were proposed as stoping criterion in the RT04f evaluation system. On one hand, the clustering can be performed while there is any positive BIC distance between any two clusters, it is called the BIC stopping criterion. On the other hand, the overall likelihood of the data given all the acoustic models can be compared between iterations and stop the processing when it starts decreasing (and revert to the previous segmentation), it is called Viterbi stopping criterion.
It must be noted that the Viterbi criterion can be applied only because the overall system complexity remains constant between iterations and therefore overall likelihoods are comparable. Note also that the Viterbi stopping criterion is in fact the BIC criterion applied over the overall model, comparing a model with M clusters and a model with M-1 clusters and stopping when M-1 is better than M.
Table 3.3 shows the resulting scores when using either BIC or Viterbi stopping criterions on the RT04f dataset. Although Viterbi stopping criterion achieves an absolute 1.55% improvement over BIC, the breakout by shows indicated mixed results and overall results are the opposite for other sets.
The final system for broadcast news uses a BIC stopping criterion as it does not require an extra clustering iteration for the stopping point to be found.
Once the system stops merging, the segmentation is output into a file. At this stage all removed non-speech regions are taken into account and inserted into the output where appropriate so that the output file is synchronous with the reference file used to evaluate its performance.