Input Data Analysis: Broadcast News versus Meetings

In this section some parameters are computed both in meetings and broadcast news shows in order to draw some conclusions on the nature of the input data to the speaker diarization system. In order to constraint the analysis to a known set of data, it has been performed on the RT04f broadcast news evaluation set and on the RT06s meetings evaluation set.

The RT04f set is composed of 12 shows, both from radio and television programs. The evaluation region in each of the shows is approximately 40 minutes, although the recording might be longer. The RT06s set is composed of two subsets, for the lecture data and conference data subdomains. The conference room data is composed of 8 meeting excerpts, with a length of around 15 minutes each. The Lecture room set is composed of 28 lecture excerpts with varying times.



Subsections

user 2008-12-08