In the experiments in this thesis the datasets used were obtained from the data available for the Rich Transcription (RT) evaluations for the meetings domain. So far the evaluations on meetings have been RT02, RT04s, RT05s and RT06s. On the later two years only the conference room type data has been used as it contains a richer variety of speakers and with characteristics matching more closely the aim of the algorithms presented in the thesis.
From all available datasets, two groups have been defined as development and test. The RT02, RT04s and RT05s sets form the development set, with a total of 24 meeting excerpts, ranging from 10 to 12 minutes in duration each. The RT06s set has been used as a test set (with 8 meeting excerpts), to compare the system improvements on data not used to tune its parameters. Figure 6.1 summarizes the data available in each one of the RT sets used. For a complete list of the individual files refer to appendix B.
These sets contain a few special characteristics that need to be taken into account. On one hand, the development set contains four meetings that only contain one available microphone. These are two pairs of two CMU meetings recorded for the RT02 and RT04s evaluations. These are not suitable to evaluate the beamforming performance but are left in the development data to obtain fair and comparable results.
On the other hand, the meeting NIST_20050412-1303 from RT05s dataset contains one speaker which was participating in the meeting through a telephone device. As will be described later on, using forced alignments to robustly evaluate the data leads to results where this speaker was not included in the reference files and therefore causes a big bias in the scores. Depending on the test performed this meeting will be eliminated to allow for a fair comparison (when doing so it will be clearly stated).