Summary of Differences and Proposed Changes

In a more theoretical point of view there are many other differences between meetings and broadcast news that need the system builder's attention when converting a system to the other domain. In table 3.14 some of these differences a pointed out (some of them already studied in this section) and in some cases a proposed solution, as described in this thesis, is given.

Table 3.14: Main differences between Meetings and Broadcast News recordings
Meetings Environment Broadcast News Environment Proposed solution
Reduced amount of speakers, limited by the capacity of the room, but unknown Totally unknown amount of speakers Automatic number of initial clusters estimation
There are neither music or commercials There can be commercials and background music with speech Changed speech/non-speech detector
There are impulsive noises (doors shut down, pens fall, speakers touch their mics...) Different background conditions occur when reporting from the field Changed speech/nonspeech detector
All recordings take place in the same setting (there could be people call into the meeting with a phone) Recordings alternate between studio and field (different bandwidth conditions).  
Different meetings can take place in different settings (rooms, microphones positions/number,...) Recordings for the same program take place in the same studio. Acoustic beamforming without layout constraints
Major use of spontaneous speech, with more silences and filling words/sounds Much more scripted speech with professional narrators. Frame and segment purification algorithms
The average speaker turn can be very small (for example yes/no answers) The average speaker turn is longer Reduced minimum duration in decoding
Normal existence of overlapping regions where two or more people speak at the same time Normally there is no (or very little) overlapping speech  
The recordings are performed using several microphones Only one channel is available Acoustic beamforming to collapse all channels into one
The far-field channels (microphones in the meeting table) regularly have worse quality than closer mics The speech quality is the regular broadcasting quality. Acoustic beamforming tries to enhance the signal

