In order to detect and filter out the non-speech frames using the detected likelihood property of the non-speech data, two variants of a likelihood-based metric are proposed.

The two metrics are based in equation 4.15 where defines the length of an average window and is used to average the measure around the desired value to avoid noisy values; is the number of Gaussian mixtures used to compute the likelihood (where , the number of mixtures in the model); is the mixture weight and is the result of evaluating on the Gaussian mixture :

**Metric 1**- A standard smoothed likelihood over 100ms of data
( with 10ms acoustic frames) around each acoustic frame, with
(all mixtures in model
).
**Metric 2**- The same smoothed likelihood (over 100ms) given a
model formed by a subset of all Gaussian mixtures in the speaker
model, which include the mixtures assigned to non-speech. The
mixtures used are selected by computing the sum of variance over
all dimensions and selecting those with smaller accumulated
variance,
. This second metric is
equivalent to metric 1 when 100% of the Gaussian mixtures are
selected.

user 2008-12-08