While bottom-up clustering is much more popular than top-down clustering, it is not clear which one can achieve better results and in which conditions. On the topic of broadcast news transcription, in Hain et al. (1998) both techniques were compared. On one hand, bottom-up clustering uses a divergence-like distance measure and a minimum cluster feature count as stoping criterion. On the other hand, top-down clustering uses the arithmetic harmonic sphericity distance and also the cluster count as stopping criterion.
Given that both Top-down and bottom-up techniques could eventually complement each other, some people have proposed systems that can combine multiple systems and obtain an improved speaker diarization.
In Tranter (2005) a cluster voting algorithm is presented to allow diarization output improvement by merging two different speaker diarization systems. Tests are performed using two top-down and two bottom-up systems.
In Moraru, Meignier, Besacier, Bonastre and Magrin-Chagnolleau (2004) and Fredouille et al. (2004) two different combination approaches are presented to combine top-down and bottom-up outputs and are applied to broadcast news and meetings processing. A first technique, called hybridization, proposes one system as initialization to the second system. The second technique is called Fusion and proposes a matching of common resulting segments followed by a resegmentation of the data to assign the non-common segments.