Pros and Cons of the NIST Evaluations

I strongly believe in the advantages behind any evaluation where different independent research groups work towards solving a common problem. But as much as I think that they are beneficiary, there are some issues that could be improved.

Participating in an evaluation campaign constitutes a wonderful opportunity for a research group to get in touch with people that work in the same topic of research, and therefore establish links and collaborations afterwards in other projects or in following evaluations. For example, in RT05s and RT06s the AMI team participated in the Speech-to-text (STT) task with contributions from multiple labs affiliated to the AMI project. Another good example is the ELISA consortium, constituted by four labs located in France which have shared expertise and built systems together for speaker diarization for years.

It is also a good framework to be able to share resources between research groups that allows for better systems to be created and for more systems to be at the top performance possible. This was the case in RT06s when Karlsruhe University shared the output of their beamforming system in order allow other labs to obtain results and perform research with the MM3A microphone array.

By participating in the evaluation campaigns it is beneficial for researchers as it sets a deadline for the systems to be ready, and allows a post-evaluation period when ``almost-done'' research can be finished and presented at the evaluation workshop. This although, can be seen as prejudicial for research groups involved in too many overlapping evaluations and other projects.

One drawback of the current rich transcription evaluations is the reduced number of participants in some of the tasks. This has been tried to address by setting up smaller tasks like Speech Activity Detection (SAD) in which many groups participated in RT06s.

By repeating the evaluations in successive years it allows technology and new ideas brought in by one group to be used by another with the purpose of solving/improving the problem at hand. Baseline tools and systems should be made available to research groups with a willingness to participate in order to allow them to obtain competitive results without the need to building a whole system.

user 2008-12-08