This section describes the different parts of the ``evallies`` package that are necessary to set up your system for both
tasks.
This section describes the different parts of the ``evallies`` package that are necessary to set up your system for both tasks.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note that input datasets, user simulation and evaluation blocks are fixed and guaranty reproducibility of the experiments.
Participants are free to edit their own system but are not allowed to modify the parameters of the user simulation or database.
...
...
@@ -146,11 +143,11 @@ Participants are free to edit their own system but are not allowed to modify the
Two datasets are available:
* __diarization train__: this dataset includes audio files together with information about the TV shows
* __diarization train__: this dataset includes audio files together with information about the TV shows
and a manual segmentation available for supervised training. This dataset is available for the initial training of the system
and later for system adaptation all along the system life-cycle.
Audio files from this dataset can be access anytime on-demand.
* __diarization lifelong__: This dataset is available in a sequential manner and used for evaluation of the system.
* __diarization lifelong__: This dataset is available in a sequential manner and used for evaluation of the system.
Each audio file is provided to the system without any other information. For each file, the system has to return an hypothesis
that will be evaluated. All information for the files of this dataset are available to the User Simulation to answer the questions.
...
...
@@ -179,10 +176,10 @@ After processing a file, your system exchanges with the user simulation through
The UEM format is a format describing the time ranges in the source audio files the system should be working on. It is used to give the boundaries of the shows but also to exclude the zones with overlapping speakers. It's a space-separated columns format, with four columns:
The MDTM format is a format describing the reference or an hypothesis for the speaker identity in a file. It's a space-separated format, with eight columns:
* File name without the extension
* Channel number (always 1)
* Start time of the speaker range
* Duration of the speaker range (beware, not end time)
* Event type (always "speaker")
* Event subtype (always "na")
* Gender ("adult_male" or "adult_female", "unknown" for hypothese, not evaluated in any case)
* Speaker id
* File name without the extension
* Channel number (always 1)
* Start time of the speaker range
* Duration of the speaker range (beware, not end time)
* Event type (always "speaker")
* Event subtype (always "na")
* Gender ("adult_male" or "adult_female", "unknown" for hypothese, not evaluated in any case)
* Speaker id
In the references, the speaker id is the speaker name of the form "Firstname\_LASTNAME", in the hypothesis it is a unique, space-less, identifier per speaker.