Commit dbf7c196 authored by Anthony Larcher's avatar Anthony Larcher
Browse files

update readme

parent 0ccfd185
......@@ -107,12 +107,11 @@ LIUM baseline diarization system is based on previous publications
### Installation and first run
Install conda, create the **allies** environment with the following commands:
Install conda, download ``allies.yaml`` available in this repository and create the **allies** environment with the following commands:
conda create -n allies python=3.9
conda env create -f allies.yaml
conda activate allies
pip install evallies
Download the archive ''allies.tar.gz'', untar it and move in the directory.
......@@ -121,7 +120,7 @@ tar -xvf allies.tar.gz
cd ./allies
Download the data (wav, mdtm and uem files).
Download the data available on the ALLIES ftp repository.
Download command...
......@@ -131,9 +130,142 @@ Run the baseline system.
To run the x-vector system, download MUSAN.
# Integrate your system
This section describes the different parts of the ``evallies`` package that are necessary to set up your system for both
Note that input datasets, user simulation and evaluation blocks are fixed and guaranty reproducibility of the experiments.
Participants are free to edit their own system but are not allowed to modify the parameters of the user simulation or database.
## Input datasets
Two datasets are available:
* __diarization train__: this dataset includes audio files together with information about the TV shows
and a manual segmentation available for supervised training. This dataset is available for the initial training of the system
and later for system adaptation all along the system life-cycle.
Audio files from this dataset can be access anytime on-demand.
* __diarization lifelong__: This dataset is available in a sequential manner and used for evaluation of the system.
Each audio file is provided to the system without any other information. For each file, the system has to return an hypothesis
that will be evaluated. All information for the files of this dataset are available to the User Simulation to answer the questions.
Iterate over the dataset by using the following code:
for idx, (show, file_info, uem, ref, filename) in enumerate(lifelong_data):
execute your code here
## ALLIES API description (How to interact with the User Simulation)
This section first describes the objects included in the evallies package that are required to interact with the user simulation.
In a second part we describe the methods to be used to interact.
### REFERENCE: The hypothesis format to interact with user simulation
After processing a file, your system exchanges with the user simulation through a **Reference** object.
### UEM format description
The UEM format is a format describing the time ranges in the source audio files the system should be working on. It is used to give the boundaries of the shows but also to exclude the zones with overlapping speakers. It's a space-separated columns format, with four columns:
* File name without the extension
* Channel number (always 1)
* Start time of zone to diarize
* End time of zone to diarize
Example extract:
TV8_LaPlaceDuVillage_2011-03-14_172834 1 407.621 471.166
TV8_LaPlaceDuVillage_2011-03-14_172834 1 471.666 478.396
TV8_LaPlaceDuVillage_2011-03-14_172834 1 476.920 493.571
TV8_LaPlaceDuVillage_2011-03-14_172834 1 492.927 495.556
### MDTM format description
The MDTM format is a format describing the reference or an hypothesis for the speaker identity in a file. It's a space-separated format, with eight columns:
* File name without the extension
* Channel number (always 1)
* Start time of the speaker range
* Duration of the speaker range (beware, not end time)
* Event type (always "speaker")
* Event subtype (always "na")
* Gender ("adult_male" or "adult_female", "unknown" for hypothese, not evaluated in any case)
* Speaker id
In the references, the speaker id is the speaker name of the form "Firstname\_LASTNAME", in the hypothesis it is a unique, space-less, identifier per speaker.
Example extract:
TV8_LaPlaceDuVillage_2011-03-14_172834 1 407.621 15.040 speaker na adult_male Michel_THABUIS
TV8_LaPlaceDuVillage_2011-03-14_172834 1 422.661 18.148 speaker na adult_male Philippe_DEPARIS
TV8_LaPlaceDuVillage_2011-03-14_172834 1 440.809 30.357 speaker na adult_male Michel_THABUIS
TV8\_LaPlaceDuVillage_2011-03-14_172834 1 471.666 6.730 speaker na adult_male Philippe_DEPARIS
### Interacting with the human in the loop.
Each file from the lifelong learning dataset comes with a flag stored in the __file_info__ variable and named __supervision__,
that specifies the mode of human assisted learning for this file. The mode can be:
* __active__ the system is allowed to ask questions to the human in the loop;
* __interactive__ once the system produces a first hypothesis, the human in the loop provides corrections
to the system to improve the hypothesis;
* __none__ Human assisted learning is OFF for this file. The system can still adapt the model in an unsupervised manner.
While processing an audio file, the system can perform unsupervised learning
and goes through the the Human Assisted Learning process if supervision mode is either
active or interactive.
The code below shows how to interact with the user simulation:
# Create a fake request that is used to initiate interactive learning
# For the case of active learning, this request is overwritten by your system
request = {"request_type": "toto", "time_1": 0.0, "time_2": 0.0}
# A request is defined as follow:
# The system can send a question to the human in the loop
# by using an object of type request
# The request is the question asked to the system
# package the request to be sent to the user simulation together with
# the ID of the file of interest and the current hypothesis
message_to_user = {
"file_id": file_id, # ID of the file the question is related to
"hypothesis": current_hypothesis, # The current hypothesis in ALLIES format
"system_request": request, # the question for the human in the loop
# Send the request to the user simulation and receive the answer
human_assisted_learning, user_answer = user.validate(message_to_user)
The user simulation returns two objects:
* __human_assisted_learning__ a boolean
* __user_answer__, the answer of the user simulation that is defined as follow:
## Integrate your system
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment