Data Preprocessing

Due to the different sources and recording setups, the average amplitude of the signal to be processed can vary over a large range. Therefore it needs to be normalized to be able to bring consistency in the follow-on processing. A standard energy average over all the recording would not be plausible due to the existence of extended silence regions and of sudden noise bursts. In order to compute the normalization constant $\mu$ was chosen, which is more robust to these effects as shown in equation 4.1. This same expression is used in the filter&sum processing to obtain the overall channels weighting factor of the input signals 5.2.2.

$\displaystyle \mu = \frac{1}{P}\sum_{p=1}^{P}max(s[p \ cdot TF_{s}], \cdots, s[(p+1)\cdot TF_{s}])$

(4.1)

where is the total amount of non overlapped blocks of duration $TF_{s}$ (with $F_{s}$ being the sampling rate in samples/second, and T is the analysis segment size in seconds) in the recording. Each block of samples ranges from $p\cdot TF_{s}$ to $(p+1)\cdot TF_{s}$

Finally a low-pass Butterworth filter deletes all high band noises leaving only information of the signal below . This is done because the major part of the energy of the signal is contained in this band and no information is needed but the energy at this point of the non-speech detection process. This Butterworth filter has been implemented using its IIR form.

user 2008-12-08