RT05s Official Performance Scores

The main metric used for the RT05s evaluation was the Diarization Error Rate (DER) not taking into account the speaker overlap regions. The DER scores as they were released by NIST are shown in the ninth column of table 7.1, together with a summary of each system's characteristics. The numbers in the tenth column reflect improvements after small bug fixes right after the evaluation, mainly coming from problems in two of the meetings.

Table 7.1: Systems summary description and DER on the evaluation set for RT05s

System ID	room	Task	Submit	Delay	# Initial	Acoustic	Mics	DER	post-eval
	type		type	&sum	clusters	min. dur.	used		DER
p-dspursys	Conf.	MDM	Primary	YES	10	3 sec	All	18.56%	16.33%
p-pursys	Conf.	SDM	Primary	NO	10	3 sec	SDM	15.32%	--
p-omnione	Lect.	MDM	Primary	NO	n/a	n/a	n/a	12.21%	--
c-spnspone	Lect.	MDM	Contrast	NO	n/a	n/a	n/a	12.84%	--
c-ttoppur	Lect.	MDM	Contrast	NO	5	5 sec	Tabletop	10.41%	10.21%
p-omnione	Lect.	SDM	Primary	NO	n/a	n/a	n/a	12.21%	--
c-pur12s	Lect.	SDM	Contrast	NO	5	12 sec	SDM	10.43%	10.47%
p-omnione	Lect.	MSLA	Primary	NO	n/a	n/a	n/a	12.21%	--
c-nwsdpur12s	Lect.	MSLA	Contrast	YES	5	12 sec	All	9.98%	9.66%
c-wsdpur12s	Lect.	MSLA	Contrast	YES ^7.1	5	12 sec	All	9.99%	9.78%

In figures 7.1 and 7.2 the DER scores are shown for each one of the excerpts used in the evaluations for conference and lecture room data. The different excerpts are shown in the horizontal axis and the DER in the vertical axis, showing one curve for each one of the presented systems as described before. In the lecture room data the table omits the full meeting names and just show the terminations, which indicates the content of the meeting. Excerpts terminated with ``E1'' or ``E3'' only contain the lecturer and therefore it is easier for the system to obtain a perfect diarization.

**Figure 7.1:** *DER Break-down by meeting for the RT05s conference data*
$\begin{figure} \centering {\epsfig{figure=figures/RT05s_conf,width=120mm}} \end{figure}$

**Figure 7.2:** *DER break-down by show for the RT05s lecture data*
$\begin{figure} \centering {\epsfig{figure=figures/RT05s_lect,width=120mm}} \end{figure}$

The use of filter&sum to enhance the signal before doing the clustering turned out to be a bad choice for the conference room systems, as the SDM DER is smaller than the MDM. This was explained due to the big difference between the quality of the signal of the different microphones. When using the best quality microphone as the SDM channel it is difficult to improve such signal using the other channels combined via filter&sum. A weighted version of the algorithm was proposed to automatically (and adaptively) weight those channels with better quality signal. The weight computation was improved for RT06s evaluation.