Methodology of the Evaluations

Each one of the NIST evaluations start one year in advance during the workshop organized to share results from the previous RT evaluation. In there all the participants are able to make comments on the different tasks and propose changes in the evaluation or possible new evaluations. A schedule is then set for each of the necessary deadlines to follow towards the next evaluation.

During the months following the workshop normally a set of conference calls occur where further details are polished in terms of available databases, metrics used or changes in the tasks. A deadline is normally set for research groups to commit to run the evaluations. This is normally about one month before the evaluation starts.

On the months prior to the evaluation period some development and training data is distributed and for STT there are limits set on the sources of the data that can be used for training so that what differentiates the systems is their algorithms and not the amount of data they are trained on.

The evaluation data is handed to all sites at the same time and they have normally about three weeks to process it. In RT05s and RT06s the conference room results were due a week earlier than the lecture room results, to allow labs with fewer resources to process all. By participating in the evaluation all sites make a pledge not to do any development using the evaluation data, so that results from their systems are a realistic indication of performance on unseen data.

Once the results have been turned in to NIST, scores are normally delivered to the participating sites within a week. The scores computed for each entry are the Diarization Error Rate (DER), as explained in the experiments chapter. For the task of speech activity detection the same score is used but considering any speaker segment as speech, wether if it is one or multiple speakers talking.

After results are made public each participant then prepares a paper describing the systems they used in order to share the knowledge acquired during the evaluation. These papers are presented in a Workshop where all participants can meet each other and start planning for the following year's evaluation. On both RT05s and RT06s the Workshop has coincided with the MLMI workshop, and the papers of the evaluation participants have been published in Lecture Notes in Compute Science from Springer jointly with the workshop's papers.

user 2008-12-08