NIST 2002 Speaker Recognition Evaluation

In 2002 NIST started the series of speaker diarization evaluations for meetings including them in the speaker recognition evaluation. In that occasion systems were evaluated for broadcast news recordings, telephone conversations and meetings recordings. In that case only one channel of audio data was provided for any of the cases, therefore multiple channel techniques were not necessary. The meetings data used was recorded by NIST.

There were four participants in that evaluation, namely CLIPS-IMAG, LIA, ELISA consortium and MITLL. The systems can be grouped in two:

The ELISA consortium (Moraru et al., 2002) was formed at that time by three laboratories in France (LIA, CLIPS-IMAG and Lab. Dynamique du langage, DDL). They presented two systems (both based on hierarchical clustering, a top-down and a bottom-up system), which they presented independently (LIA and CLIPS-IMAG individual submissions) and then combined (ELISA submission) in the same way as in Moraru, Meignier, Besacier, Bonastre and Magrin-Chagnolleau (2004) and Fredouille et al. (2004). The ELISA consortium and/or its individual components have been constant and active participants in all the speaker diarization evaluations since 2002. In Moraru, Besacier, Meignier, Fredouille and francois Bonastre (2004) they describe the evolution of their system over these years.
MIT Lincoln Labs (MITLL) presented a system inspired in speaker identification techniques (Dunn et al., 2000). It first performs a speaker segmentation using a modified GLR metric like in Wilcox et al. (1994) and follows with a GMM-UBM modeling technique to cluster segments into the different speakers.

user 2008-12-08