Derivative Filtering

Given the normalized and filtered energy signal ( $\tilde{e}[n]$ ) a derivative filter is used in order to enhance the speech/non-speech change-points. This processing helps prevent degradation due to low signal-to-noise ratios or nonstationary environments and was first introduced by Li et al. (2002). Such filter is defined via the following impulse response,

$\displaystyle h[n]=\{-f[-W\leq n \leq 0], f[1\leq n \leq W]\}$

(4.2)

Where,

$\displaystyle f[n]$	$\displaystyle =$	$\displaystyle e^{An}[K_1\sin(An)+K_2\cos(An)]$
		$\displaystyle +e^{-An}[K_3\sin(An)+K_4\cos(An)]$
		$\displaystyle +K_5+K_6e^{sn}$	(4.3)

And,

$\displaystyle A$	$\displaystyle =$	$\displaystyle 0.41s$
$\displaystyle s$	$\displaystyle =$	$\displaystyle \frac{7}{W}$
$\displaystyle W$	$\displaystyle =$	Half of the window length.	(4.4)

And the values of the coefficients $[K_1\ldots K_6]=[1.583, 1.468, -0.078, -0.036, -0.872, -0.56]$ , for a chosen window length . The selection of an appropriate value for the parameter is important as it sets the temporal resolution of the detector.

**Figure:** *Left, filter over $\tilde{e}[n]$ . Decision of silence in red after the thresholding.*
$\begin{figure} \centerline{\epsfig{figure=figures/spnsp_mateu1,width=\columnwidth}} \end{figure}$

As shown in fig. 4.2 the result of the convolution of $\tilde{e}[n]$ and , $\hat{e}[n]$ is thresholded and labelled, each sample, as speech or non-speech.

user 2008-12-08