Databases

In order for research to be performed in speech technologies, there is a constant need for data collection and annotation. In this respect there have been several efforts over the years to collect data on the meeting environment. On the particular area of speaker diarization systems for Meetings, there needs to be meetings databases accurately transcribed into speaker segments. Nowadays a few databases are already available and a few more are currently being recorded and transcribed, some of them are:

ICSI Meetings Corpus (ICSI Meetings Recorder corpus (2006), Janin et al. (2003)): 75 meetings with about 72 hours in total. They were recorded in a single meeting room, with 4 omnidirectional tabletop and 2 electret microphones mounted on a mock PDA.
CMU Meeting Corpus (CMU Meetings Corpus website (2006), Burger et al. (2002)) : 104 meetings of an average duration of 60 minutes with 6.4 participants (in average) per meeting (only 18 meetings are publicly available through LDC). They are focused on a given scenario or topic, changing from meeting to meeting. Initial meetings have 1 omnidirectional microphone, newer ones have 3 omnidirectional tabletop microphones.
NIST Pilot Meeting Corpus (NIST Pilot Meeting Corpus website, 2006): Consists of 19 meetings with a total of about 15 hours. Several meeting types are proposed to the attendants. Audio recordings are done using 3 omnidirectional table-top microphones and one circular directional microphone with 4 elements.
CHIL Corpus: Recordings were conducted in 4 different meeting room locations consisting on lecture type meetings. Each meeting room is composed of several distant microphones, as well as speaker localization microphones and microphone arrays. Each meeting also contains several video cameras.
AMI corpus (Augmented Multiparty Interaction (AMI) website, 2006): About 100 hours of meetings with generally 4 participants were recorded, transcribed and released through their website. These are split into two main groups: real meetings and scenario-based meetings (where people are briefed to talk about a particular topic). One or more circular arrays of 8 microphones each are centrally located in the table. no video was collected.
M4 audio-visual corpus (McCowan et al., 2005): Created within the auspices of the M4 project (EU sponsored), used multiple microphones and cameras to record each participant.
VACE multimodal corpus (Chen et al., 2005): Is a video and acoustics meeting database created within the ARDA VACE-II project recording mainly military related meetings.
LDC meetings data: The Linguistic Data Consortium (LDC) has been in charge of transcribing and distributing most of the databases in this list. Also, in an effort to contribute to the NIST Meetings evaluation campaigns, it recorded a set of meetings (Strassel and Glenn, 2004) within the SPINE/ROAR project (Speech in noisy environments, 2006).

user 2008-12-08