Derivative Filtering

Given the normalized and filtered energy signal ( $ \tilde{e}[n]$) a derivative filter is used in order to enhance the speech/non-speech change-points. This processing helps prevent degradation due to low signal-to-noise ratios or nonstationary environments and was first introduced by Li et al. (2002). Such filter is defined via the following impulse response,

$\displaystyle h[n]=\{-f[-W\leq n \leq 0], f[1\leq n \leq W]\}$ (4.2)

Where,
$\displaystyle f[n]$ $\displaystyle =$ $\displaystyle e^{An}[K_1\sin(An)+K_2\cos(An)]$  
    $\displaystyle +e^{-An}[K_3\sin(An)+K_4\cos(An)]$  
    $\displaystyle +K_5+K_6e^{sn}$ (4.3)

And,

$\displaystyle A$ $\displaystyle =$ $\displaystyle 0.41s$  
$\displaystyle s$ $\displaystyle =$ $\displaystyle \frac{7}{W}$  
$\displaystyle W$ $\displaystyle =$ Half of the window length. (4.4)

And the values of the coefficients $ [K_1\ldots K_6]=[1.583, 1.468,
-0.078, -0.036, -0.872, -0.56]$, for a chosen window length $ W=31$. The selection of an appropriate value for the $ W$ parameter is important as it sets the temporal resolution of the detector.

Figure: Left, filter over $ \tilde{e}[n]$. Decision of silence in red after the thresholding.
\begin{figure}
\centerline{\epsfig{figure=figures/spnsp_mateu1,width=\columnwidth}}
\end{figure}

As shown in fig. 4.2 the result of the convolution of $ \tilde{e}[n]$ and $ h[n]$, $ \hat{e}[n]$ is thresholded and labelled, each sample, as speech or non-speech.

user 2008-12-08