Possibly the most noticeable difference when performing speaker diarization in the meetings environment versus other domains (like broadcast news or telephone speech) is the availability, at times, of multiple channels which are laid out inside the meetings room, synchronously recording what occurs in the meeting.
In order to take advantage of this fact one needs to explore an area of signal processing that differs from standard speech modeling techniques pointed out in previous sections and which constitutes a complex topic of research by itself. This is the area of microphone array beamforming for speech/acoustic enhancement (see for example Veen and Buckley (1988), Krim and Viberg (1996)). Although the task at hand differs from some of the assumptions taken in the beamforming theory, it will be found beneficiary to take it as a background for the use of all the microphones available.
Microphone array beamforming techniques usually take advantage of the fact that the same acoustic signal arrives to each of the microphones (forming the shape decided for the array) at a slightly different time due to the delay of propagation of the signal through the air. By combining the signals of all microphones (in different ways) one can simulate a directional microphone whose acoustic beam focuses on the speaker or acoustic event which is predominant, at each instant, in the meetings room. There are multiple acoustic beamforming techniques which require different degrees of knowledge on the microphone characteristics and the location of the speakers.
First, a theoretical overview of acoustic signal beamforming is given in 2.5.1, followed by a look at the most predominant acoustic beamforming techniques in 2.5.2. In the task at hand one does not know a priory how many speakers there are, or their locations in the room, therefore several possible techniques to find the Time Delay of Arrival (TDOA) between microphone pairs will also be reviewed in 2.5.3.