Speaker Diarization: from Broadcast News to Meetings

When applying a known technique to a new task it is preferable to do it starting from some well rooted theory and some implementation that has been proven to be successful in a task similar to the proposed one, while analyzing its shortcoming on this new domain and proposing improvements to it. This is the case of the diarization system presented for the meetings environment, which is based on the system previously developed at the International Computer Science Institute (ICSI) for the task of broadcast news. It has been developed by proposing alternatives to the algorithms that had some room for improvement or that needed to be adapted to better fit the new domain. Also, given that the broadcast news system is designed to run only on a single-channel recording, the necessary algorithms have also been implemented to adapt the signals from multiple channels/microphones to be able to process them with the presented system.

This chapter covers the description of both the broadcast news system and the new meetings domain system, bridging the gap between both by analyzing the differences that have been observed during development.

In the first part, the broadcast news system is described in detail, pointing out the main ideas behind it and its implementation, and baseline results are shown regarding its performance for the NIST Rich transcription evaluations for broadcast news (RT03s and RT04f) in which ICSI participated.

Following the broadcast news description, a comparison on some of the parameters measurable on both domains (meetings and broadcast news) is offered. The differences between them are pointed out, as well as the areas where this thesis proposes improvements in converting a system from one task to the other.

Finally, a description of the meetings domain speaker diarization system is given. The detailed description of all the novel algorithms involved in the new system is split between the current and next two chapters. In this chapter a detailed description is given of those algorithms that have been adapted from different sources but that are not considered a novelty of this thesis by themselves. It also gives an overview description of the rest of the algorithms (novel in this thesis) to obtain a complete view of the overall system.

The techniques considered to be the primary contribution of this thesis will be described in chapters 4, focusing in those algorithms within the single-channel speaker diarization system, and 5 which deals with the use of the multiple channels in a meeting room to further improve the system.

Subsections

user 2008-12-08