Helping Diarization Using the Spoken Transcripts

A very interesting area of study to improve speaker Diarization in certain conditions is the use of the transcripts from the acoustic signal in order to extract information that can help assigning each speaker turn to each cluster. Such transcripts can be obtained via an automatic speech recognition system.

In Canseco-Rodriguez et al. (2004a), Canseco-Rodriguez et al. (2004b) and Canseco et al. (2005) the use of such linguistic information is studied for the domain of broadcast news, where people normally present themselves and interact with the other speakers calling them by their names. In these, they propose a set of rules to identify the speaker presenting himself, and the speakers who he precedes and who speaks after him. The rules are applied to speaker turns generated with a decoder-based system which is the output of an ASR system, but no further speaker diarization techniques are proposed.

