This section focuses on the two latest evaluations performed on
the meetings domain, namely RT05s and RT06s. These two are similar
in that two different subdomains were proposed, with different
microphone configurations within each subdomain. All systems were
allowed to run with unlimited runtime speed so that they could be
comparable within the same metrics. The speed of each system was
reported as part of the system description.
In brief, the two proposed subdomains were:
- Conference room meetings: These are conducted
around a meetings table with several participants involved in an
active conversation among them. It contains various amounts of
speaker overlap (depending on the nature of the meeting). These
have been the focus of research of several projects including the
European AMI project.
- Lecture room meetings: These are conducted in a
lecture setting where a lecturer gives a presentation in front of
an audience, which normally interrupts with questions during the
talk. In these meetings the lecturer normally speaks for most of
the time during the talk, and it becomes more balanced during
question and answer sections. It has been the focus of research of
the European CHIL project.
In each one of the meeting rooms there are multiple microphones
available which record the signal synchronously.In some settings
there are also cameras, but these fall outside of the scope of the
speaker diarization evaluation. The microphones are clustered in
different groups to determine different conditions/evaluation
subtasks. The following list points out the terminology used for
each of the possible groups and wether it is used in the speaker
diarization evaluation and in which domain:
- (Single Distant Microphone): This is defined
as one of the centrally located microphones in the room, located
on the meetings table. This microphone is always part of the
bigger MDM group. Both lecture and conference room subdomains run
- (Multiple Distant Microphones): These are a
set of microphones situated on the meeting table. All participants
in the conference room subdomain sit around the table as well as
participants on the lecture room subdomain except for the
lecturer. This task also exists in both subdomains.
- (Multiple Mark III Microphone Arrays): The
lecture meetings contain one or two of these arrays, which were
built by NIST and contain 64 microphones setup linearly.
Diarization could be run on either 64 channels or a beamformed
version of it distributed by Karlsruhe University for RT06s.
- (Multiple Source Localization Microphone
Arrays): These are four groups of four microphones positioned into
a ``T'' shape array which were originally defined for speaker
localization. They are only found in the lecture subdomain.
- (All Distant Microphones): In lecture room
recordings this task allows the system to use all possible
microphones previously explained (all except for the IHM
microphones). The conference room subdomain does not usually
define this task as all distant microphones are of MDM type.
- (Individual Headphone Microphone Arrays):
Although not evaluated in the diarization evaluations, these
microphones are worn by some of the participants in the meetings.
They are a task in the STT evaluation and are also used when
creating the forced-alignment reference segmentations for speaker