Time Constraints on Speech/non-Speech

After the energy is filtered for the third time one needs to impose some time constraints to avoid changing too quickly between speech and non-speech. A finite state machine (FSM) has been implemented for this purpose. In this FSM described in figure 4.3 the time constraints are forced through enter times and leave times according to the values of $ \hat{e}[n]$ using two thresholds (enter thrld, $ \Theta_{enter}$ and leave thrld, $ \Theta_{leave}$) on each sample. The selection of the right thresholds is crucial to the correctness of the detector and, although the energies have been initially normalized, might differ from meeting to meeting. The threshold enter thrld is defined to be an order of magnitude bigger than leave thrld and its value is iteratively defined by the hybrid system described below. As for the appropriate minimum time of either speech or non-speech states it must be estimated using development data, but as it will be shown, it is more independent to meeting room variations than the threshold values.

Inside the FSM, the conditions to go from non-speech to speech are the same to go from speech to non speech. This way to go from speech to non-speech, $ \hat{e}[n]$ has to be higher than the threshold to enter ( $ \Theta_{enter}$), and vice versa:

$\displaystyle \hat{e}[n_1]\geq\Theta_{enter}{\ }\&{\ }State_t=NSP \rightarrow{\ }$ $\displaystyle {\ }State_{t+1}=SP$    
$\displaystyle \hat{e}[n_2]\leq\Theta_{exit}{\ }\&{\ }State_t=SP \rightarrow{\ }$ $\displaystyle {\ }State_{t+1}=NSP$ (4.5)

where NSP is a non-speech state and SP is a speech state.

Figure 4.3: State machine used to apply time constraints.

user 2008-12-08