In order for research to be performed in speech technologies,
there is a constant need for data collection and annotation. In
this respect there have been several efforts over the years to
collect data on the meeting environment. On the particular area of
speaker diarization systems for Meetings, there needs to be
meetings databases accurately transcribed into speaker segments.
Nowadays a few databases are already available and a few more are
currently being recorded and transcribed, some of them are:
- ICSI Meetings Corpus (ICSI Meetings Recorder corpus (2006),
Janin et al. (2003)): 75 meetings with about 72 hours
in total. They were recorded in a single meeting room, with 4
omnidirectional tabletop and 2 electret microphones mounted on a
- CMU Meeting Corpus (CMU Meetings Corpus website (2006),
Burger et al. (2002)) : 104 meetings of an average
duration of 60 minutes with 6.4 participants (in average) per
meeting (only 18 meetings are publicly available through LDC).
They are focused on a given scenario or topic, changing from
meeting to meeting. Initial meetings have 1 omnidirectional
microphone, newer ones have 3 omnidirectional tabletop
- NIST Pilot Meeting Corpus (NIST Pilot Meeting Corpus website, 2006): Consists of 19
meetings with a total of about 15 hours. Several meeting types are
proposed to the attendants. Audio recordings are done using 3
omnidirectional table-top microphones and one circular directional
microphone with 4 elements.
- CHIL Corpus: Recordings were conducted in 4 different
meeting room locations consisting on lecture type meetings. Each
meeting room is composed of several distant microphones, as well
as speaker localization microphones and microphone arrays. Each
meeting also contains several video cameras.
- AMI corpus (Augmented Multiparty Interaction (AMI)
website, 2006): About 100 hours of meetings with
generally 4 participants were recorded, transcribed and released
through their website. These are split into two main groups: real
meetings and scenario-based meetings (where people are briefed to
talk about a particular topic). One or more circular arrays of 8
microphones each are centrally located in the table. no video was
- M4 audio-visual corpus (McCowan et al., 2005): Created
within the auspices of the M4 project (EU sponsored), used
multiple microphones and cameras to record each participant.
- VACE multimodal corpus (Chen et al., 2005): Is a video
and acoustics meeting database created within the ARDA VACE-II
project recording mainly military related meetings.
- LDC meetings data: The Linguistic Data Consortium (LDC) has
been in charge of transcribing and distributing most of the
databases in this list. Also, in an effort to contribute to the
NIST Meetings evaluation campaigns, it recorded a set of meetings
(Strassel and Glenn, 2004) within the SPINE/ROAR project
(Speech in noisy environments, 2006).