In 2002 NIST started the series of speaker diarization evaluations
for meetings including them in the speaker recognition evaluation.
In that occasion systems were evaluated for broadcast news
recordings, telephone conversations and meetings recordings. In
that case only one channel of audio data was provided for any of
the cases, therefore multiple channel techniques were not
necessary. The meetings data used was recorded by NIST.
There were four participants in that evaluation, namely
CLIPS-IMAG, LIA, ELISA consortium and MITLL. The systems can be
grouped in two:
- The ELISA consortium (Moraru et al., 2002) was formed at that
time by three laboratories in France (LIA, CLIPS-IMAG and Lab.
Dynamique du langage, DDL). They presented two systems (both based
on hierarchical clustering, a top-down and a bottom-up system),
which they presented independently (LIA and CLIPS-IMAG individual
submissions) and then combined (ELISA submission) in the same way
as in Moraru, Meignier, Besacier, Bonastre and
Magrin-Chagnolleau (2004) and Fredouille et al. (2004).
The ELISA consortium and/or its individual components have been
constant and active participants in all the speaker diarization
evaluations since 2002. In Moraru, Besacier, Meignier, Fredouille and francois
Bonastre (2004) they describe
the evolution of their system over these years.
- MIT Lincoln Labs (MITLL) presented a system inspired in
speaker identification techniques (Dunn et al., 2000). It first
performs a speaker segmentation using a modified GLR metric like
in Wilcox et al. (1994) and follows with a GMM-UBM modeling
technique to cluster segments into the different speakers.