RT05s Official Performance Scores

The main metric used for the RT05s evaluation was the Diarization Error Rate (DER) not taking into account the speaker overlap regions. The DER scores as they were released by NIST are shown in the ninth column of table 7.1, together with a summary of each system's characteristics. The numbers in the tenth column reflect improvements after small bug fixes right after the evaluation, mainly coming from problems in two of the meetings.

Table 7.1: Systems summary description and DER on the evaluation set for RT05s
System ID room Task Submit Delay # Initial Acoustic Mics DER post-eval
type type &sum clusters min. dur. used DER
p-dspursys Conf. MDM Primary YES 10 3 sec All 18.56% 16.33%
p-pursys Conf. SDM Primary NO 10 3 sec SDM 15.32% --
p-omnione Lect. MDM Primary NO n/a n/a n/a 12.21% --
c-spnspone Lect. MDM Contrast NO n/a n/a n/a 12.84% --
c-ttoppur Lect. MDM Contrast NO 5 5 sec Tabletop 10.41% 10.21%
p-omnione Lect. SDM Primary NO n/a n/a n/a 12.21% --
c-pur12s Lect. SDM Contrast NO 5 12 sec SDM 10.43% 10.47%
p-omnione Lect. MSLA Primary NO n/a n/a n/a 12.21% --
c-nwsdpur12s Lect. MSLA Contrast YES 5 12 sec All 9.98% 9.66%
c-wsdpur12s Lect. MSLA Contrast YES 7.1 5 12 sec All 9.99% 9.78%

In figures 7.1 and 7.2 the DER scores are shown for each one of the excerpts used in the evaluations for conference and lecture room data. The different excerpts are shown in the horizontal axis and the DER in the vertical axis, showing one curve for each one of the presented systems as described before. In the lecture room data the table omits the full meeting names and just show the terminations, which indicates the content of the meeting. Excerpts terminated with ``E1'' or ``E3'' only contain the lecturer and therefore it is easier for the system to obtain a perfect diarization.

Figure 7.1: DER Break-down by meeting for the RT05s conference data
\centering {\epsfig{figure=figures/RT05s_conf,width=120mm}}

Figure 7.2: DER break-down by show for the RT05s lecture data
\centering {\epsfig{figure=figures/RT05s_lect,width=120mm}}

The use of filter&sum to enhance the signal before doing the clustering turned out to be a bad choice for the conference room systems, as the SDM DER is smaller than the MDM. This was explained due to the big difference between the quality of the signal of the different microphones. When using the best quality microphone as the SDM channel it is difficult to improve such signal using the other channels combined via filter&sum. A weighted version of the algorithm was proposed to automatically (and adaptively) weight those channels with better quality signal. The weight computation was improved for RT06s evaluation.

user 2008-12-08