Bibliography

Adami, A., Burget, L., Dupont, S., Garudadri, H., Grezl, F., Hermansky, H., Jain, P., Kajarekar, S., Morgan, N. and Sivadas, S.: 2002, Qualcomm-icsi-ogi features for asr, Proc. International Conference on Speech and Language Processing.

Adami, A. G., Kajarekar, S. S. and Hermansky, H.: 2002, A new speaker change detection method for two-speaker segmentation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Orlando, Florida.

Aguilo, M.: 2005, Deteccion de actividad oral en un sistema de diarizacion, Master's thesis, UPC.

Ajmera, J.: 2004, Robust Audio Segmentation, PhD thesis, Ecole Polytechnique Federale de Lausanne.

Ajmera, J., Bourlard, H. and Lapidot, I.: 2002, Improved unknown-multiple speaker clustering using HMM, Technical report, IDIAP.

Ajmera, J., Lathoud, G. and McCowan, I.: 2004, Clustering and segmenting speakers and their locations in meetings, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 605-608.

Ajmera, J., McCowan, I. and Bourlard, H.: 2003, Robust speaker change detection, Technical report, IDIAP.

Ajmera, J., McCowan, I. and Bourlard, H.: 2004, Robust speaker change detection, IEEE Signal Processing Letters 11(8), 649-651.

Ajmera, J. and Wooters, C.: 2003, A robust speaker clustering algorithm, IEEE Automatic Speech Recognition and Understanding Workshop, US Virgin Islands, USA.

Anguera, X.: 2005, Xbic: Real-time cross probabilities measure for speaker segmentation, Technical Report TR-99-2004, ICSI.

Anguera, X., Aguilo, M., Wooters, C., Nadeu, C. and Hernando, J.: 2006, Hybrid speech/non-speech detector applied to speaker diarization of meetings, Speaker Odyssey 06, Puerto Rico, USA.

Anguera, X. and Hernando, J.: 2004a,
Evolutive speaker segmentation using a repository system, Proc. International Conference on Speech and Language Processing, Jeju Island, Korea.

Anguera, X. and Hernando, J.: 2004b,
XBIC: nueva medida para segmentacion de locutor hacia el indexado automatico de la senal de voz, III Jornadas en Tecnologia del Habla, Valencia, Spain.

Anguera, X., Wooters, C. and Hernando, J.: 2005, Speaker diarization for multi-party meetings using acoustic fusion, IEEE Automatic Speech Recognition and Understanding Workshop, Puerto Rico, USA.

Anguera, X., Wooters, C. and Hernando, J.: 2006a,
Automatic cluster complexity and quantity selection: Towards robust speaker diarization, MLMI'06, Washington DC, USA.

Anguera, X., Wooters, C. and Hernando, J.: 2006b,
Frame purification for cluster comparison in speaker diarization, MMUA'06, Toulouse, France.

Anguera, X., Wooters, C. and Hernando, J.: 2006c,
Friends and enemies: A novel initialization for speaker diarization, Proc. International Conference on Speech and Language Processing, Pittsburgh, USA.

Anguera, X., Wooters, C. and Hernando, J.: 2006d,
Purity algorithms for speaker diarization of meetings data, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France.

Anguera, X., Wooters, C. and Pardo, J. M.: 2006a,
Robust speaker diarization for meetings: ICSI RT06s evaluation system, Proc. International Conference on Speech and Language Processing, Pittsburgh, USA.

Anguera, X., Wooters, C. and Pardo, J. M.: 2006b,
Robust speaker diarization for meetings: ICSI RT06s meetings evaluation system, RT06s Meetings Recognition Evaluation, Washington DC, USA.

Anguera, X., Wooters, C., Peskin, B. and Aguilo, M.: 2005, Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system, RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain.

Appel, U. and Brandt, A.: 1982, Adaptive sequential segmentation of piecewise stationary time series, Inf. Sci. 29(1), 27-56.

Attias, H.: 2000, A variational bayesian framework for graphical models, Advances in Neural information processing systems .
MIT Press, Cambridge.

Augmented Multiparty Interaction (AMI) website: 2006.
URL: http://www.amiproject.org

Bakis, R., Chen, S., Gopalakrishnan, P. and Gopinath, R.: 1997, Transcription of broadcast news shows with the IBM large vocabulary speech recognition system, Speech Recognition Workshop, pp. 67-72.

Barras, C., Zhu, X., Meignier, S. and Gauvain, J.-L.: 2004, Improving speaker diarization, Fall 2004 Rich Transcription Workshop (RT04), Palisades, NY.

Basseville, M. and Nikiforov, I.: 1993, Detection of abrupt changes-theory abd application, Prentice-Hall.

Beigi, H. S. and Maes, S. H.: 1998, Speaker, channel and environment change detection, World Congress on Automation.

Beigi, H. S., Maes, S. H. and Sorensen, J. S.: 1998, A distance measure between collections of distributions and its application to speaker recognition, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Detroit, USA.

Ben, M., Betser, M., Bimbot, F. and Gravier, G.: 2004, Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs, Proc. International Conference on Speech and Language Processing, Jeju Island, Korea.

Bilmes, J. and Zweig, G.: 2002, The graphical models toolkit: an open source software system for speech and time-series processing, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Orlando, Fl, USA.

Bimbot, F. and Mathan, L.: 1993, Text-free speaker recognition using an arithmetic-harmonic sphericity measure, Eurospeech'93, Berlin, Germany, pp. 169-172.

Bonastre, J.-F., Delacourt, P., Fredouille, C., Merlin, T. and Wellekens, C.: 2000, A speaker tracking system based on speaker turn detection for NIST evaluation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, pp. 1177-1180.

Brandstein, M., Adcock, J. and Silverman, H.: 1995, A practical time-delay estimator for localizing speech sources with a microphone array, Comput. Speech Lang. 9, 153-159.

Brandstein, M. and Griebel, S.: 2001, Explicit Speech Modeling for Microphone Array Applications, Springer, chapter 7.

Brandstein, M. S. and Silverman, H. F.: 1997, A robust method for speech signal time-delay estimation in reverberant rooms, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Munich, Germany.

Brandstein, M. and Ward, D.: 2001, Microphone Arrays, Springer.

Burger, S., Maclaren, V. and Yu, H.: 2002, The ISL meeting corpus: The impact of meeting type on speech style, Proc. International Conference on Speech and Language Processing, Denver, USA.

Campbell, J. P.: 1997, Speaker recognition: a tutorial, Proceedings of the IEEE 1.85(9), 1437-1462.

Canseco, L., Lamel, L. and Gauvain, J.-L.: 2005, A comparative study using manual and automatic transcriptions for diarization, IEEE Automatic Speech Recognition and Understanding Workshop, San Juan, Puerto Rico.

Canseco-Rodriguez, L., Lamel, L. and Gauvain, J.-L.: 2004a,
Speaker Diarization from Speech Transcripts, Proc. International Conference on Speech and Language Processing, Jeju Island, S. Korea, pp. 1272-1275.

Canseco-Rodriguez, L., Lamel, L. and Gauvain, J.-L.: 2004b,
Towards using STT for Broadcast News Speaker Diarization, Proc. DARPA RT04, Palisades NY.

Carter, G., Nuttall, A. H. and Cable, P. G.: 1973, The smoothed coherence transform, Proc. IEEE (Lett.) 61, 1497-1498.

Cassidy, S.: 2004, The macquarie speaker diarization system for rt04s, NIST 2004 Spring Rich Transcrition Evaluation Workshop, Montreal, Canada.

Cettolo, M. and Vescovi, M.: 2003, Efficient audio segmentation algorithms based on the BIC, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Champagne, B., Bedard, S. and Stephenne, A.: 1996, Performance of time-delay estimation in the presence of room reverberation, IEEE Transactions on Speech and Audio Processing .

Chan, W., Lee, T., Zheng, N. and hua Ouyang: 2006, Use of vocal source features in speaker segmentation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France.

Chen, L., Rose, R. T., Parrill, F., Han, X., Tu, J., Huang, Z., Harper, M., Quek, F., McNeill, D., Tuttle, R. and Huang, T.: 2005, Vace multimodal meeting corpus, MLMI, Edimburgh, UK.

Chen, S. S., Gales, M. J. F., Gopinath, R. A., Kanvesky, D. and Olsen, P.: 2002, Automatic transcription of broadcast news, Speech Communication 37, 69-87.

Chen, S. S. and Gopalakrishnan, P.: 1998, Clustering via the bayesian information criterion with applications in speech recognition, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, Seattle, USA, pp. 645-648.

Chickering, D. M. and Heckerman, D.: 1997, Efficient approximations for the marginal likelihood of bayesian networks with hidden variables, Machine Learning 29, 181-212.

CMU Meetings Corpus website: 2006.
URL: http://penance.is.cs.cmu.edu/meeting_room

Cognitive Assistant that Learns and Organizes (CALO) website: 2006.
URL: http://caloproject.sri.com/

Cohen, I. and Berdugo, B.: 2002, Speech enhancement based on a microphone array and log-spectral amplitude estimation, 22nd Convention of Electrical and Electronics Engineers in Israel.

Collet, M., Charlet, D. and Bimbot, F.: 2005, A correlation metric for speaker tracking using anchor models, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA.

Computers in the Human Interaction Loop (CHIL) website: 2006.
URL: http://chil.server.de

Cox, H., Zeskind, R. and Kooij, I.: 1986, Practical supergain, IEEE Transactions on Acoustics, Speech and Signal Processing 34(3), 393-397.

Cox, H., Zeskind, R. and Owen, M.: 1987, Robust adaptive beamforming, IEEE Transactions on Acoustics, Speech and Signal Processing 35(10), 1365-1376.

DARPA Effective, Affordable, Reusable Speech-to-Text (EARS): 2004.
URL: http://www.darpa.mil/ipto/programs/ears

Delacourt, P., Kryze, D. and Wellekens, C. J.: 1999a,
Detection of speaker changes in an audio document, Eurospeech-1999, Budapest, Hungary.

Delacourt, P., Kryze, D. and Wellekens, C. J.: 1999b,
Speaker-based segmentation for audio data indexing, ESCA Workshop on accessing Information in Audio Data.

Delacourt, P. and Wellekens, C. J.: 1999, Audio data indexing: Use of second-order statistics for speaker-based segmentation, IEEE International Conference on Multimedia, Computing and Systems, Florence, Italy.

Delacourt, P. and Wellekens, C. J.: 2000, DISTBIC: A speaker-based segmentation for audio data indexing, Speech Communication: Special Issue in Accessing Information in Spoken Audio 32, 111-126.

Deshayes, J. and Picard, D.: 1986, Off-line statistical analysis of change-point models using non-parametric and likelihood methods, Springer-Verlag.

Digalakis, V., Monaco, P. and Murveit, H.: 1996, Genones: generalized mixture tying in continuous hidden markov model-based speech recognizers, IEEE transactions on speech and audio processing 4(4), 281-289.

Doclo, S. and Moonen, M.: 2002, Gsvd-based optimal filtering for single and multimicrophone speech enhancement, IEEE Trans. Signal Processing 50, 2230-2244.

Duda, R. and Hart, P.: 1973, Pattern classification and Scene analysis, John Wiley & Sons.

Dunn, R. B., Reynolds, D. and Quatieri, T. F.: 2000, Approaches to speaker detection and tracking in conversational speech, Digital signal processing 10, 93-112.

Eckart, C.: 1952, Optimal rectifier systems for the detection of steady signals, Technical Report Rep SI0 12692, SI0 Ref 52-11,1952, Univ. California, Scripps Inst. Oceanography, Marine Physical Lab.

Ellis, D. and Liu, J. C.: 2004, Speaker turn detection based on between-channels differences, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

F. Reed, P. F. and Bershad, N.: 1981, Time delay estimation using the lms adaptive filter - static behavior, IEEE Transactions on Acoustics, Speech and Signal Processing .

Fiérrez-Aguilar, J., Ortega-García, J. and González-Rodríguez, J.: 2003, Fusion strategies in multimodal biometric verification, IEEE International Conference on Multimedia and Expo.

Fischer, S. and Kammeyer, K.-D.: 1997, Broadband beamforming with adaptive postfiltering for speech acquisition in noisy environments, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Fiscus, J. G., Ajot, J., Michet, M. and Garofolo, J. S.: 2006, The rich transcription 2006 spring meeting recognition evaluation, NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA.

Fiscus, J. G., Garofolo, J., Ajot, J. and Michet, M.: 2006, Rt-06s speaker diarization results and speech activity detection results, NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA.

Fiscus, J. G., Radde, N., Garofolo, J. S., Le, A., Ajot, J. and Laprun, C. D.: 2005, The rich transcription 2005 spring meeting recognition evaluation, NIST 2005 Spring Rich Transcrition Evaluation Workshop, Edimburgh, UK.

Flanagan, J., Johnson, J., Kahn, R. and Elko, G.: 1994, Computer-steered microphone arrays for sound transduction in large rooms, Journal of the Acoustic Society of America 78, 1508-1518.

Fredouille, C., Moraru, D., Meignier, S., Besacier, L. and Bonastre, J.-F.: 2004, The NIST 2004 spring rich transcription evaluation: Two-axis merging strategy in the context of multiple distant microphone based meeting speaker segmentation, NIST 2004 Spring Rich Transcrition Evaluation Workshop, Montreal, Canada.

Gallardo-Antolin, A., Anguera, X. and Wooters, C.: 2006, Multi-stream speaker diarization systems for the meetings domain, Proc. International Conference on Speech and Language Processing, Pittsburgh, USA.

Gangadharaiah, R., Narayanaswamy, B. and Balakrishnan, N.: 2004, A novel method for two-speaker segmentation, Proc. International Conference on Speech and Language Processing, Jeju, S. Korea.

Garofolo, J. S., Laprun, C. D. and Fiscus, J. G.: 2004, The rich transcription 2004 spring meeting recognition evaluation, NIST 2004 Spring Rich Transcrition Evaluation Workshop, Montreal, Canada.

Gauvain, J.-L., Lamel, L. and Adda, G.: 1998, Partitioning and transcription of broadcast news data, Proc. International Conference on Speech and Language Processing, Vol. 4, Sidney, Australia, pp. 1335-1338.

Gish, H. and Schmidt, M.: 1994, Text-independent speaker identification, Signal Processing Magazine, IEEE pp. 18-32.

Gish, H., Siu, M.-H. and Rohlicek, R.: 1991, Segregation of speakers for speech recognition and speaker identification, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, Toronto, Canada, pp. 873-876.

Griffiths, L. and Jim, C.: 1982, An alternative approach to linearly constrained adaptive beamforming, IEEE Trans. on Antenas and Propagation .

Hain, T., Johnson, S., Turek, A., Woodland, P. and Young, S. J.: 1998, Segment generation and clustering in the HTK broadcast news transcription system, DARPA Broadcast News Transcription and Understanding Workshop, pp. 133-137.

Heck, L. and Sankar, A.: 1997, Acoustic clustering and adaptation for robust speech recognition, Eurospeech-97, Rhodes, Greece.

Hoshuyama, O., Sugiyama, A. and Hirano, A.: 1999, A robust adaptive beamformer for microphone arrays with a blocking matrix using coefficient-constrained adaptive filters, IEEE Trans. on Signal Processing .

Humaine emotion research website: 2006.
URL: http://emotion-research.net/

Hung, J., Wang, H. and Lee, L.: 2000, Automatic metric based speech segmentation for broadcast news via principal component analysis, Proc. International Conference on Speech and Language Processing, Beijing, China.

ICSI Meeting Recorder Project: Channel skew in ICSI-recorded meetings: 2006.
URL: http://www.icsi.berkeley.edu/ dpwe/research/mtgrcdr/chanskew.html

ICSI Meetings Recorder corpus: 2006.
URL: http://www.icsi.berkeley.edu/Speech/mr

Ifeachor, E. and Jervis, B.: 1996, Digital signal processing: a practical approach, Addison-Wesley.

Ikbal, S., Misra, H., Sivadas, S., Hermansky, H., and Bourlard, H.: 2004, Entropy based combination of tandem representations for noise robust asr, Proc. International Conference on Speech and Language Processing, South Korea.

improvements of the E-HMM based speaker diarization system for meetings records, T.: 2006, The rich transcription 2006 spring meeting recognition evaluation, NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA.

Interactive Multimodal Information Management (IM2) website: 2006.
URL: http://www.im2.ch

Istrate, D., Fredouille, C., Meignier, S., Besacier, L. and Bonastre, J.-F.: 2005, NIST RT05S evaluation: Pre-processing techniques and speaker diarization on multiple microphone meetings, NIST 2005 Spring Rich Transcrition Evaluation Workshop, Edinburgh, UK.

Janin, A., Ang, J., Bhagat, S., Dhillon, R., Edwards, J., Macias-Guarasa, J., Morgan, N., Peskin, B., Shriberg, E., Stolcke, A., Wooters, C. and Wrede, B.: 2004, The icsi meeting project: Resources and research, ICCASP, Montreal.

Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A. and Wooters, C.: 2003, The ICSI meeting corpus, ICCASP, Hong Kong.

Janin, A., Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Frankel, J. and Zheng, J.: 2006, The ICSI-SRI spring 2006 meeting recognition system, Proceedings of the Rich Transcription 2006 Spring Meeting Recognition Evaluation, Washington, USA.

Jin, H., Kubala, F. and Schwartz, R.: 1997, Automatic speaker clustering, DARPA Speech Recognition workshop, Chantilly, USA.

Jin, Q., Laskowski, K., Schultz, T. and Waibel, A.: 2004, Speaker segmentation and clustering in meetings, NIST 2004 Spring Rich Transcrition Evaluation Workshop, Montreal, Canada.

Johnson, D. and Dudgeon, D.: 1993, Array signal processing, Prentice Hall.

Johnson, S.: 1999, Who spoke when? - automatic segmentation and clustering for determining speaker turns, Eurospeech-99, Budapest, Hungary.

Johnson, S. and Woodland, P.: 1998, Speaker clustering using direct maximization of the MLLR-adapted likelihood, Proc. International Conference on Speech and Language Processing, Vol. 5, pp. 1775-1779.

Juang, B. and Rabiner, L.: 1985, A probabilistic distance measure for hidden markov models, AT&T Technical Journal 64, AT&T.

Kaneda, Y.: 1991, Directivity characteristics of adaptive microphone-array for noise reduction (amnor), Journal of the Acoustical Society of Japan 12(4), 179-187.

Kaneda, Y. and Ohga, J.: 1986, Adaptive microphone-array system for noise reduction, IEEE Trans. on Acoustics, Speech, and Signal Processing .

Kass, R. E. and Raftery, A. E.: 1995, Bayes factors, Journal of the American Statistics association 90, 773-795.

Kataoka, A. and Ichirose, Y.: 1990, A microphone array configuration for anmor (adaptive microphone-array system for noise reduction), Journal of the Acoustical Society of Japan 11(6), 317-325.

Kemp, T., Schmidt, M., Westphal, M. and Waibel, A.: 2000, Strategies for automatic segmentation of audio data, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, pp. 1423-1426.

Kim, H.-G., Ertelt, D. and Sikora, T.: 2005, Hybrid speaker-based segmentation system using model-level clustering, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA.

Knapp, C. H. and Carter, G. C.: 1976, The generalized correlation method for estimation of time delay, IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-24(4), 320-327.

Kohonen, T.: 1990, The self-organizing map, Proceedings of the IEEE 78(9), 1464-1480.

Krim, H. and Viberg, M.: 1996, Two decades of array signal processing research, IEEE Signal Processing Magazine pp. 67-94.

Kristjansson, T., Deligne, S. and Olsen, P.: 2005, Voicing features for robust speech detection, Proc. International Conference on Speech and Language Processing, Lisbon, Portugal.

Kubala, F., Jin, H., Matsoukas, S., Gnuyen, L., Schwartz, R. and Machoul, J.: 1997, The 1996 BBN byblos HUB-4 transcription system, Speech Recognition Workshop, pp. 90-93.

Lapidot, I.: 2003, SOM as likelihood estimator for speaker clustering, Eurospeech, Geneva, Switzerland.

Lapidot, I., Gunterman, H. and Cohen, A.: 2002, Unsupervised speaker recognition based on competition between self-organizing-maps, IEEE Transactions on Neural Networks 13(4), 877-887.

Lathoud, G. and McCowan, I. A.: 2003, Location based speaker segmentation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Lathoud, G., McCowan, I. and Odobez, J.: 2004, Unsupervised location-based segmentation of multi-party speech, ICASSP-NIST Meeting Recognition Workshop.

Lathoud, G., Odobez, J.-M. and McCowan, I.: 2004, Short-term spatio-temporal clustering of sporadic and concurrent events, Technical Report IDIAP-RR 04-14, IDIAP.

Lee, K.-F.: 1998, Large vocabulary speaker-independent continuous speech recognition: the SPHINX system, PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA.

Leeuwen, D. A. V. and Huijbregts, M.: 2006, The AMI speaker diarization system for NIST RT06s meeting data, NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA.

Li, Q., Zheng, J., Tsai, A., and Zhou, Q.: 2002, Robust endpoint detection and energy normalization for real-time speech and speaker recognition, IEEE Transactions on Speech and Audio Processing 10(3).

Li, X.: 2005, Combination and Generation of Parallel Feature Streams for Improved Speech Recognition, PhD thesis, ECE Department, CMU.

Liu, D. and Kubala, F.: 1999, Fast speaker change detection for broadcast news transcription and indexing, Eurospeech-99, Vol. 3, Budapest, Hungary, pp. 1031-1034.

Lopez, J. F. and Ellis, D. P. W.: 2000a,
Using acoustic condition clustering to improve acoustic change detection on broadcast news, Proc. International Conference on Speech and Language Processing, Beijing, China.

Lopez, J. F. and Ellis, D. P. W.: 2000b,
Using acoustic condition clustering to improve acoustic change detection on broadcast news, Proc. International Conference on Speech and Language Processing, Beijing, China.

Lu, L., Li, S. Z. and Zhang, H.-J.: 2001, Content-based audio segmentation using support vector machines, ACM Multimedia Conference, pp. 203-211.

Lu, L. and Zhang, H.-J.: 2002a,
Real-time unsupervised speaker change detection, ICPR'02, Vol. 2, Quebec City, Canada.

Lu, L. and Zhang, H.-J.: 2002b,
Speaker change detection and tracking in real-time news broadcasting analysis, ACM International Conference on Multimedia, pp. 602-610.

Lu, L., Zhang, H.-J. and Jiang, H.: 2002, Content analysis for audio classification and segmentation, IEEE Transactions on Speech and Audio Processing 10(7), 504-516.

MacKay, D. J. C.: 1997, Ensemble learning for hidden Markov models.
http://www.inference.phy.cam.ac.uk/mackay/abstracts/ensemblePaper.html.

Malegaonkar, A., Ariyaeeinia, A., Sivakumaran, P. and Fortuna, J.: 2006, Unsupervised speaker change detection using probabilistic pattern matching, IEEE Signal Processing Letters 13(8), 509-512.

Marro, C., Mahieux, Y. and Simmer, K.: 1998, Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering, IEEE Trans. on Speech and Audio Processing .

McCowan, I.: 2001, Robust Speech Recognition using microphone arrays, PhD thesis, Queensland University of Technology, Australia.

McCowan, I. A., Pelecanos, J. and Sridharan, S.: 2001, Robust speaker recognition using microphone arrays, IEEE Speaker Odyssey recognition workshop.

McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M. and Zhang, D.: 2005, Automatic analysis of multimodal group actions in meetings, IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 305-317.

McCowan, I., Marro, C. and Mauuary, L.: 2000, Robust speech recognition using near-field superdirective beamforming with post-filtering, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 3, pp. 1723-1726.

McCowan, I., Moore, D. and Sridharan, S.: 2000, Speech enhancement using near-field superdirectivity with an adaptive sidelobe canceler and post-filter, Australian International Conference on Speech Science and Technology, pp. 268-273.

Meignier, S., Bonastre, J.-F. and Igournet, S.: 2001, E-HMM approach for learning and adapting sound models for speaker indexing, A speaker Oddissey, Chania, Crete, pp. 175-180.

Meignier, S., Moraru, D., Fredouille, C., Besacier, L. and Bonastre, J.-F.: 2004, Benefits of prior acoustic segmentation for automatic speaker segmentation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada.

Meinedo, H. and Neto, J.: 2003, Audio segmentation, classification and clustering in a broadcast news task, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Hong-Kong, China.

Metze, F., Fugen, C., Pan, Y., Schultz, T. and Yu, H.: 2004, The ISL RT-04S meetings transcription system, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada.

Mirghafori, N., Stolcke, A., Wooters, C., Pirinen, T., Bulyko, I., Gelbart, D., Graciarena, M., Otterson, S., Peskin, B. and Ostendorf, M.: 2004, From switchboard to meetings: Development of the 2004 ICSI-SRI-UW meeting recognition system, Proc. International Conference on Speech and Language Processing, Jeju Island, Korea.

Mirghafori, N. and Wooters, C.: 2006, Nuts and flakes: A study of data characteristics in speaker diarization, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France.

Misra, H., Bourlard, H., and Tyagi, V.: 2003, New entropy based combination rules in hmm/ann multi-stream asr, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Hong Kong.

Moh, Y., Nguyen, P. and Junqua, J.-C.: 2003, Towards domain independent speaker clustering, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Hong Kong.

Moraru, D., Ben, M. and Gravier, G.: 2005, Experiments on speaker tracking and segmentation in radio broadcast news, Proc. International Conference on Speech and Language Processing, Lisbon, Portugal.

Moraru, D., Besacier, L., Meignier, S., Fredouille, C. and francois Bonastre, J.: 2004, Speaker diarization in the elisa consodrium over the last 4 years, NIST 2004 Spring Rich Transcrition Evaluation Workshop, Montreal, Canada.

Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F. and Magrin-Chagnolleau, I.: 2002, The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation, NIST 2002 Spring Rich Transcrition Evaluation Workshop.

Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F. and Magrin-Chagnolleau, I.: 2004, The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada.

Moraru, D., Meignier, S., Fredouille, C., Besacier, L. and Bonastre, J.-F.: 2004, The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada.

Mori, K. and Nakagawa, S.: 2001, Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, Salt Lake City, USA, pp. 413-416.

Multimodal Meeting Manager (M4) website: 2006.
URL: http://www.m4project.org

Nakagawa, S. and Suzuki, H.: 1993, A new speech recognition method based on VQ-distortion and hmm, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, Minneapolis, USA, pp. 676-679.

National Institute for Standards and Technology: 2006.
URL: http://www.nist.gov/speech

Nguyen, P.: 2003, SWAMP: An isometric frontend for speaker clustering, NIST 2003 Rich Transcription Workshop, Boston, USA.

Nishida, M. and Kawahara, T.: 2003, Unsupervised speaker indexing using speaker model selection based on bayesian information criterion, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Hong Kong.

NIST Fall Rich Transcription Evaluation website: 2006.
URL: http://www.nist.gov/speech/tests/rt/rt2004/fall

NIST Fall Rich Transcription on meetings 2006 Evaluation Plan:
2006.
URL: http://www.nist.gov/speech/tests/rt/rt2006/spring/docs/rt06s-meeting-eval-plan-V2.pdf

NIST MD-eval-v21 DER evaluation script: 2006.
URL: http://www.nist.gov/speech/tests/rt/rt2006/spring/code/md-eval-v21.pl

NIST Pilot Meeting Corpus website: 2006.
URL: http://www.nist.gov/speech/test_beds/mr_proj/meeting_co rpus_1

NIST Rich Transcription evaluations, website: http://www.nist.gov/speech/tests/rt: 2006.
URL: http://www.nist.gov/speech/tests/rt

NIST Speech Recognition Evaluation: 2006.
URL: http://www.nist.gov/speech/tests/spk/index.htm

NIST Speech tools and APIs: 2006.
URL: http://www.nist.gov/speech/tools/index.htm

NIST Spring Rich Transcription Evaluation in Meetings website, http://www.nist.gov/speech/tests/rt/rt2005/spring: 2006.
URL: http://www.nist.gov/speech/tests/rt/rt2005/spring

Omar, M. K., Chaudhari, U. and Ramaswamy, G.: 2005, Blind change detection for audio segmentation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA.

Ouellet, P., Boulianne, G. and Kenny, P.: 2005, Fravors of gaussian warping, Proc. International Conference on Speech and Language Processing, Lisbon, Portugal.

Pardo, J. M., Anguera, X. and Wooters, C.: 2006a,
Speaker diarization for multi-microphone meetings using only between-channel differences, MLMI 2006.

Pardo, J. M., Anguera, X. and Wooters, C.: 2006b,
Speaker diarization for multiple distant microphone meetings: Mixing acoustic features and inter-channel time differences, Proc. International Conference on Speech and Language Processing.

Pattern analysis, Statistical modeling and Computational learning (Pascal) website: 2006.
URL: http://www.pascal-network.org/

Pelecanos, J. and Sridharan, S.: 2001, Feature warping for robust speaker verification, ISCA Speaker Recognition Workshop odyssey, Crete, Grece.

Perez-Freire, L. and Garcia-Mateo, C.: 2004, A multimedia approach for audio segmentation in TV broadcast news, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, pp. 369-372.

Pwint, M. and Sattar, F.: 2005, A segmentation method for noisy speech using genetic algorithm, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA.

Rentzeperis, E., Stergiou, A., Boukis, C., Pnevmatikakis, A. and Polymenakos, L. C.: 2006, The 2006 athens information technology speech activity detection and speaker diarization systems, NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA.

Reynolds, D. A., Singer, E., Carlson, B. A., O'Leary, G. C., McLaughlin, J. J. and Zixxman, M. A.: 1998, Blind clustering of speech utterances based on speaker and language characteristics, Proc. International Conference on Speech and Language Processing, Sidney, Australia.

Reynolds, D. and Torres-Carrasquillo, P.: 2004, The MIT Lincoln Laboratories RT-04F diarization systems: Applications to broadcast audio and telephone conversations, Fall 2004 Rich Transcription Workshop (RT04), Palisades, NY.

Roch, M. and Cheng, Y.: 2004, Speaker segmentation using the MAP-adapted bayesian information criterion, Odyssey-04, Toledo, Spain, pp. 349-354.

Rombouts, G. and M.Moonen: 2003, Qrd-based unconstrained optimal filtering for acoustic noise reduction, IEEE Trans. Signal Processing 83(9), 1889-1904.

Rosca, J., Balan, R. and Beaugeant, C.: 2003, Multi-channel psychoacoustically motivated speech enhancement, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Ross, A., Jain, A. K. and Qian, J. Z.: 2001, Information fusion in biometrics, 3rd International Conference on Audio and Video-Based Person Authentication.

Roth, P.: 1971, Effective measurements using digital signal analysis, IEEE Spectrum 8, 62-70.

Rougui, J., Rziza, M., Aboutajdine, D., Gelgon, M. and Martinez, J.: 2006, Fast incremental clustering of gaussian mixture speaker models for scaling up retrieval in on-line broadcast, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France.

Sanchez-Bote, J., Gonzalez-Rodriguez, J. and Ortega-Garcia, J.: 2003, A real-time auditory-basec microphone array assessed with e-rasti evaluation proposal, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Sankar, A., Beaufays, F. and Digalakis, V.: 1995, Training data clustering for improved speech recognition, Eurospeech-95, Madrid, Spain.

Sankar, A., Weng, F., Stolcke, Z. R. A. and Grande, R. R.: 1998, Development of SRI's 1997 broadcast news transcription system, DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, USA.

Schmidt, R.: 1986, Multiple emitter location and signal parameter estimation, IEEE Transactions on Antennas and Propagation .

Schwarz, G.: 1971, A sequential student test, The Annals of Statistics 42(3), 1003-1009.

Schwarz, G.: 1978, Estimating the dimension of a model, The Annals of Statistics 6, 461-464.

Shaobing Chen, S. and Gopalakrishnan, P.: 1998, Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA.

Shinozaki, T. and Ostendorf, M.: 2007, Cross-validation EM training for robust parameter estimation, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing .
submitted.

sian Cheng, S. and min Wang, H.: 2003, A sequential metric-based audio segmentation method via the bayesian information criterion, Eurospeech'03, Geneva, Switzerland.

sian Cheng, S. and min Wang, H.: 2004, METRIC-SEQDAC: A hybrid approach for audio segmentation, Proc. International Conference on Speech and Language Processing, Jeju, South Korea.

Siegler, M. A., Jain, U., Raj, B. and Stern, R. M.: 1997, Automatic segmentation, classification and clustering of broadcast news audio, DARPA Speech Recognition Workshop, Chantilly, pp. 97-99.

Similar Network of Excellence website: 2006.
URL: http://www.similar.cc/cms/default.asp?id=0

Sinha, R., Tranter, S. E., Gales, J. J. F. and Woodland, P. C.: 2005, The cambridge university march 2005 speaker diarisation system, European Conference on Speech Communication and Technology (Interspeech), Lisbon, Portugal, pp. 2437-2440.

Siu, M.-H., Yu, G. and Gish, H.: 1992, An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, San Francisco, USA, pp. 189-192.

Sivakumaran, P., Fortuna, J. and Ariyaeeinia, A.: 2001, On the use of the bayesian information criterion in multiple speaker detection, Eurospeech'01, Scandinavia.

Solomonov, A., Mielke, A., Schmidt, M. and Gish, H.: 1998, Clustering speakers by their voices, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, Seattle, USA, pp. 757-760.

Speech in noisy environments: 2006.
URL: http://www.speech.sri.com/projects/spine/

Spring 2005 (RT-05S) Rich Transcription Meeting Recognition Evaluation Plan: n.d.
URL: http://www.nist.gov/speech/tests/rt/rt2005/spring/rt05s-meeting-eval-plan-V1.pdf

Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation Plan: n.d.
URL: http://www.nist.gov/speech/tests/rt/rt2006/spring/docs/rt06s-meeting-eval-plan-V2.pdf

Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C. and Zheng, J.: 2005, Further progress in meeting recognition: The icsi-sri spring 2005 speech-to-text evaluation system, RT05s Meetings Recognition Evaluation, Edinburgh, Great Brittain.

Strassel, S. and Glenn, M.: 2004, Shared linguistic resources for human language technology in the meeting domain, ICASSP-DARPA Meetings Diarization Workshop, Montreal, Canada.

Sturim, D., Reynolds, D., Singer, E. and J.P.Campbell: 2001, Speaker indexing in large audio databases using anchor models, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, USA.

Tager, W.: 1998a,
Etudes en traitement d'antenne pour la prise de son, PhD thesis, Universite de Rennes.

Tager, W.: 1998b,
Near field superdirectivity (nfsd), Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2045-2048.

Tranter, S.: 2005, Two-way cluster voting to improve speaker diarization performance, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada.

Tranter, S. and Reynolds, D.: 2004, Speaker diarization for broadcast news, ODYSSEY'04, Toledo, Spain.

Trees, H. V.: 1968, Detection Estimation and Modulation Theory, Vol. 1, Wiley.

Tritschler, A. and Gopinath, R.: 1999, Improved speaker segmentation and segments clustering using the bayesian information criterion, Eurospeech'99, pp. 679-682.

Tsai, W.-H., Cheng, S.-S., Chao, Y.-H. and Wang, H.-M.: 2005, Clustering speech utterances by speaker using eigenvoice-motivated vector space models, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA.

Tsai, W.-H., Cheng, S.-S. and Wang, H.-M.: 2004, Speaker clustering of speech utterances using a voice characteristic reference space, Proc. International Conference on Speech and Language Processing, Jeju Island, Korea.

Tsai, W.-H. and Wang, H.-M.: 2006, On maximizing the within-cluster homogeneity of speaker voice characteristics for speech utterance clustering, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France.

Valente, F.: 2006, Infinite models for speaker clustering, Proc. International Conference on Speech and Language Processing, Pittsburgh, USA.

Valente, F. and Wellekens, C.: 2004, Variational bayesian speaker clustering, Speaker Odyssey, Toledo, Spain.

Valente, F. and Wellekens, C.: 2005, Variational bayesian adaptation for speaker clustering, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Lisbon, Portugal.

Valin, J., Rouat, J. and Michaud, F.: 2004, Microphone array post-filter for separation of simultaneous non-stationary sources, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

van Leeuwen, D.: 2005, The TNO speaker diarization system system for NIST RT05s for meeting data, NIST 2005 Spring Rich Transcrition Evaluation Workshop, Edinburgh, UK.

Vandecatseye, A. and Martens, J.-P.: 2003, A fast, accurate and stream-based speaker segmentation and clustering algorithm, Eurospeech'03, Geneva, Switzerland, pp. 941-944.

Vandecatseye, A., Martens, J.-P. et al.: 2004, The cost278 pan-european broadcast news database, LREC'04, Lisbon, Portugal.

Veen, B. V. and Buckley, K.: 1988, Beamforming: A versatile approach to spacial filtering, IEEE Transactions on Acoustics, Speech and Signal Processing .

Verlinde, P., Chollet, G. and Acheroy, M.: 2000, Multi-modal identity verification using expert fusion, Information Fusion 1(1), 17-33.

Vescovi, M., Cettolo, M. and Rizzi, R.: 2003, A DP algoritm for speaker change detection, Eurospeech'03.

Video analysis and content extraction for defense intelligence (ARDA-VACE II): 2006.
URL: http://www.informedia.cs.cmu.edu/arda/vaceII.html

Wactlar, H., Hauptmann, A. and Witbrock, M.: 1996, News on-demand experiments in speech recognition, ARPA STL Workshop.

Wegmann, S., Scattone, F., Carp, I., Gillick, L., Roth, R. and Yamron, J.: 1998, Dragon system's 1997 broadcast news transcription system, DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, USA.

Wiener and Norbert: 1949, Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley.

Wilcox, L., Chen, F., Kimber, D. and Balasubramanian, V.: 1994, Segmentation of speech using speaker identification, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, Adelaide, Australia, pp. 161-164.

Willsky, A. S. and Jones, H. L.: 1976, A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems, IEEE Transactions on Automatic Control AC-21(1), 108-112.

Woodland, P., Gales, M., Pye, D. and Young, S.: 1997, The development of the 1996 HTK broadcast news transcription system, Speech Recorgnition Workshop, pp. 73-78.

Wooters, C., Fung, J., Peskin, B. and Anguera, X.: 2004, Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system, Fall 2004 Rich Transcription Workshop (RT04), Palisades, NY.

Wu, T., Lu, L., Chen, K. and Zhang, H.-J.: 2003a,
UBM-based incremental speaker adaptation, ICME'03, Vol. 2, pp. 721-724.

Wu, T., Lu, L., Chen, K. and Zhang, H.-J.: 2003b,
UBM-based real-time speaker segmentation for broadcasting news, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Wu, T., Lu, L., Chen, K. and Zhang, H.-J.: 2003c,
Universal background models for real-time speaker change detection, International Conference on Multimedia Modeling.

Yamaguchi, M., Yamashita, M. and Matsunaga, S.: 2005, Spectral cross-correlation features for audio indexing of broadcast news and meetings, Proc. International Conference on Speech and Language Processing.

Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V. and Woodland, P.: 2005, The HTK Book, Cambridge University Engineering Department.

Zdansky, J. and Nouza, J.: 2005, Detection of acoustic change-points in audio records via grobal BIC maximization and dynamic programming, Proc. International Conference on Speech and Language Processing, Lisbon, Portugal.

Zelinski, R.: 1988, A microphone array with adaptive post-filtering for noise reduction in reverberant rooms, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 5, pp. 2578-2581.

Zhang, X., Hansen, J. and Rehar, K.: 2004, Speech enhancement based on a combined multi-channel array with constrained iterative and auditory masked processing, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing.

Zhou, B. and Hansen, J. H.: 2000, Unsupervised audio stream segmentation and clustering via the bayesian information criterion, Proc. International Conference on Speech and Language Processing, Vol. 3, Beijing, China, pp. 714-717.

Zhu, X., Barras, C., Lamel, L. and Gauvain, J.-L.: 2006, Speaker diarization: from broadcast news to lectures, NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA.

Zhu, X., Barras, C., Meignier, S. and Gauvain, J.-L.: 2005, Combining speaker identification and bic for speaker diarization, Proc. International Conference on Speech and Language Processing, Lisbon, Portugal.

Zochova, P. and Radova, V.: 2005, Modified DISTBIC algorithm for speaker change detection, Proc. International Conference on Speech and Language Processing, Lisbon, Portugal.



user 2008-12-08