In order to detect and filter out the non-speech frames using the
detected likelihood property of the non-speech data, two variants
of a likelihood-based metric are proposed.
|
(4.15) |
The two metrics are based in equation 4.15 where
defines the length of an average window and is used to average the
measure around the desired value to avoid noisy values;
is the number of Gaussian mixtures used to compute
the likelihood (where
, the number of mixtures
in the model); is the mixture weight and
is the result of evaluating
on the Gaussian mixture
:
- Metric 1
- A standard smoothed likelihood over 100ms of data
( with 10ms acoustic frames) around each acoustic frame, with
(all mixtures in model
).
- Metric 2
- The same smoothed likelihood (over 100ms) given a
model formed by a subset of all Gaussian mixtures in the speaker
model, which include the mixtures assigned to non-speech. The
mixtures used are selected by computing the sum of variance over
all dimensions and selecting those with smaller accumulated
variance,
. This second metric is
equivalent to metric 1 when 100% of the Gaussian mixtures are
selected.
user
2008-12-08