Baseline Systems
Taking as a reference the blocks diagram in figure
3.5, experiments were conducted on three of the main
blocks, namely the filter&sum module, the speech/non-speech
module and the mono-channel speaker diarization module. For each
block a baseline was defined to suit its characteristics and to
allow for the development of its optimum parameters selection. The
initial Wiener filtering of the signal was not analyzed as it was
used without modification from its original implementation outside
of the scope of this thesis.
The baseline system used for the experiments in the diarization
module and in the speech/non speech module corresponds to a
modified version of the broadcast news system presented for the
NIST RT04f evaluation as described in section 3.1.
This corresponds to a mono-channel system (or Single Distant
Microphone, SDM, in the meetings domain) with the following main
differences from RT04f:
- The speech/non-speech (spnsp) detector used in the
experiments for the hybrid spnsp algorithm is composed of a
two-states HMM model trained with meetings data, as it was used in
the RT05s evaluation and explained in section 3.1. For
the speaker diarization module the proposed hybrid spnsp detector
was used instead, with parameters equal to the values used for the
RT06s evaluation (see Anguera, Wooters and Pardo (2006b),
(Anguera, Wooters and Pardo, 2006a)). These use the parameter values optimized in
the spnsp experiments section.
- During the agglomerative clustering processing the same
speaker turn minimum duration is applied as in the broadcast news
system (3 seconds). Before the output of the resulting
segmentation, a final segmentation step is performed using the
same speaker models but reducing the minimum duration to 1.5
seconds to allow for smaller speaker turns to be properly
detected.
- The HMM acoustic models used in the segmentation of the data
do not have any maximum time constraint, as explained in section
4.2.3, to allow the speaker segments to be as
long as the acoustics dictate. As shown in
Anguera, Wooters and Hernando (2006a) it does not change much the DER of the
systems but allows for longer speaker segments to be created.
- A few bug fixes regarding floating point values
inexactitudes were resolved which slightly changed the system
outputs.
- The BIC-based stopping criterion is used in all experiments
in order to stop clustering when the optimum number of clusters is
reached.
The baseline system used for experiments on the beamforming module
is composed of the submission to RT06s NIST evaluation campaign.
This contains all the modules as explained in section
5.2 and their parameters optimized using a
subset of 10 meetings from the development data available for
RT06s.
user
2008-12-08