Use of the Beamformed Signal for ASR

The beamforming system presented for this thesis was also used to obtain an enhanced signal for the ASR systems at ICSI presented to the RT NIST evaluations. For RT05s the same beamforming system was used for ASR than for diarization. As explained in Stolcke et al. (2005), evaluated in the RT04s eval set, and not considering the CMU mono-channel meetings, the new beamforming outperformed in 2.8% absolute (from 42.9% word error rate to 40.1%) the previous beamforming system in use at ICSI, which was based on delay&sum of full speech segments.

For the RT06s system the beamforming module was tuned separately from the diarization module to optimize for Word Error Rate (WER) with is a word-based metric (not as the DER, which is time-based). This lead to a system which was more robust than the RT05s beamformer.

Table 6.12: WER using RT06s ASR system including the presented beamformer
dataset SDM MDM ADM MM3A
RT05s 47.7% 45.8% 38.6% -
RT06s 57.3% 55.5% 51% 56%

As seen in Janin et al. (2006) and reproduced in table 6.12 the RT05s and RT06s datasets were used to evaluate the RT06s ASR submission in terms of WER. In both datasets there is an improvement of almost 2% absolute by using a single channel or the MDM beamformed signal, where the ASR system only differs in the F&S algorithm use and minor tuning parameters, optimized for each case.

This improvement becomes much larger between the MDM and ADM cases, where the improvement is exclusively due to the increase of microphones available in the ADM case and therefore to the improvement in signal quality due to the beamforming processing.

The mark III microphone arrays (MM3a) were available for the RT06s evaluation. Tests performed comparing results with other state of the art beamforming systems showed that the proposed beamformer achieves an excellent performance.

user 2008-12-08