Data used on the Speaker Diarization Evaluations

The test datasets used in both RT05s and RT06s evaluations were composed of conference and lecture type data. The conference data is composed of ten and nine meeting excerpts of 12 minutes each. One meeting was eliminated from RT06s after the evaluation finished for technical issues. These datasets have been used in this thesis to evaluate the different proposed techniques and are covered in mode detail in the experiments chapter and in appendix B.

The lecture room data for test was composed of excerpts of different sizes contributed by the different partners in the CHIL project and corresponding to different instants in a lecture meeting. In particular:

The development data used in these evaluations was usually a compilation of the data sets from previous evaluation campaigns. The used sets for conference room data were from RT02s and RT04s evaluations for RT05s, and a subset of RT02s through RT05s for the RT06s evaluation. For the lecture room evaluations, as this subdomain was first included in the evaluation in RT05s, there was no prior datasets available and therefore NIST distributed a set of transcribed lecture recordings similar to those in RT05s. For RT06s development was done using a subset of the original development set plus the RT05s evaluation set.

Although the diarization system does not use any training data, the speech/non-speech detector used in RT05s needed to be trained. It used around 80 hours of meetings data extracted from the ICSI meeting corpus.

