Meetings Environment |
Broadcast News Environment |
Proposed solution |
Reduced amount of speakers, limited by the capacity of the room,
but unknown |
Totally unknown amount of speakers |
Automatic number of initial clusters estimation |
There are neither music or commercials |
There can be
commercials and background music with speech |
Changed speech/non-speech detector |
There are impulsive noises (doors shut down, pens fall, speakers
touch their mics...) |
Different background
conditions occur when reporting from the field |
Changed speech/nonspeech detector |
All recordings take place in the same setting (there could be
people call into the meeting with a phone) |
Recordings alternate
between studio and field (different bandwidth
conditions). |
|
Different meetings can take place in different settings (rooms,
microphones positions/number,...) |
Recordings for the same
program take place in the same studio. |
Acoustic beamforming without layout constraints |
Major use of spontaneous speech, with more silences and filling
words/sounds |
Much more scripted speech with professional
narrators. |
Frame and segment purification algorithms |
The average speaker turn can be very small (for example yes/no
answers) |
The average speaker turn is longer |
Reduced minimum duration in decoding |
Normal existence of overlapping regions where two or more people
speak at the same time |
Normally there is no (or very little) overlapping speech |
|
The recordings are performed using several microphones |
Only
one channel is available |
Acoustic beamforming to collapse all channels into one |
The far-field channels (microphones in the meeting table)
regularly have worse quality than closer mics |
The speech
quality is the regular broadcasting quality. |
Acoustic beamforming tries to enhance the signal |