diar.py 49.3 KB
Newer Older
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# -*- coding: utf-8 -*-
#
# This file is part of S4D.
#
# SD4 is a python package for speaker diarization based on SIDEKIT.
# S4D home page: http://www-lium.univ-lemans.fr/s4d/
# SIDEKIT home page: http://www-lium.univ-lemans.fr/sidekit/
#
# S4D is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as
# published by the Free Software Foundation, either version 3 of the License,
# or (at your option) any later version.
#
# S4D is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with SIDEKIT.  If not, see <http://www.gnu.org/licenses/>.

"""
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
23
Diar is a class describing an audio/video segmentation file. A diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
24
25
26
27
28
29
30
31
32
33
34
35
36
37
contains a list of segments. Where each row is segment composed of n values
identified by a attribut names.
The diarization file is the most important file in the toolkit. All programs
are driven by a diarization file and most of them generate a diarization
file (trainer generate gmm).

A diarization stores a list of ''segments'' composed of attributes.

A diarization could draw data from several shows. It is very useful in a batch
mode context (training of GMM, computing log likelihood ratio, cross-show
diarization, etc.).

Example
-------
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
38
>>> diarization[0] //get the first segment
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
39
['20041006_0700_0800_CLASSIQUE', 'Emmanuel_Cugny', 'speaker', 164, 1170]
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
40
>>> diarization[0]['show']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
41
'20041006_0700_0800_CLASSIQUE'
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
42
>>> diarization[0]['cluster']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
43
'Emmanuel_Cugny'
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
44
>>> diarization[0]['start']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
45
164
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
46
>>> diarization[0]['stop']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
47
48
49
1170

where:
Sylvain Meignier's avatar
Sylvain Meignier committed
50
51
  * attribut 0 named ''show'': ''20041006_0700_0800_CLASSIQUE'' = the show speaker
  * attribut 1 named ''cluster'' : ''Emmanuel_Cugny'' the speaker speaker
52
  * attribut 2 named ''type'' : ''speaker'' contains the cluster type (speaker or head)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
53
54
55
56
57
58
59
60
  * attribut 3 named ''start'': ''164'' is the index of the first feature of the segment
  * attribut 4 named ''stop'': ''1170'' is the index of the last feature of the segment

How to
------
* Read a diarization:
    ::

Anthony Larcher's avatar
Anthony Larcher committed
61
        from s4d.diarization import Diar
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
62
63
64
65
        diarization = Diar.read_seg('foo.seg') //LIUM Spk Diarization format
        diarization = Diar.read_mdtm('foo.seg') //MDTM format
        diarization = Diar.read_rttm('foo.seg') //RTTM format
        diarization = Diar.read_uem('foo.seg') //UEM format
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
66
67
68
69

* Get a segment:
    ::

Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
70
        seg = diarization[0]
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
71
72
73
74

* Get a attribut of a segment
    ::

75
        seg['cluster']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
76
77
78
79

* Write a diarization:
    ::

Anthony Larcher's avatar
Anthony Larcher committed
80
        diarization = Diar.write_seg('foo.seg', diarization) //LIUM format
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
81
82
83
84

* Add or remove an attribut:
    ::

Sylvain Meignier's avatar
Sylvain Meignier committed
85
        diarization.add_attribut(speaker='gender', default=None) // add an attribut named 'gender, the default value is None
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
86
        diarization.del_attribut('gender') // remove the attribut
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
87
88
89
90

* Sort a diarization:
    ::

Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
91
        diarization.sort()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
92
93
94
95

* Create a new segment:
    ::

Sylvain Meignier's avatar
Sylvain Meignier committed
96
        diarization.append(show='foo', cluster='speaker', start=0, stop=100)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
97
98
99
100
101
102
103
104
105
106
107

Add all the segments of diar2 into diar1:
    ::

        diar1.append_diar(diar2)

Modules
-------

"""

Anthony Larcher's avatar
Anthony Larcher committed
108
109


Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
110
111
112
import copy
import logging
import os
Anthony Larcher's avatar
Anthony Larcher committed
113
import re
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
114
import sys
Anthony Larcher's avatar
Anthony Larcher committed
115
116
117
118


from sidekit.sidekit_wrappers import *
from sidekit.bosaris.idmap import IdMap
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
119
from six import string_types
Anthony Larcher's avatar
Anthony Larcher committed
120
from .utils import str2str_normalize
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
121

Sulfyderz's avatar
Sulfyderz committed
122
123
124
125
try:
    from sortedcontainers import SortedDict as dict
except ImportError:
    pass
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
126
127
128
129
130
131


class Diar():
    """
    The diarization class.

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
132
    :attr _attributes: a AttributeNames object storing the attribut definitions
133
    :attr cluster_types: a list object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
134
135
136
    :attr segments: a list of Segment object
    """
    def __init__(self):
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
137
138
        self._attributes = AttributeNames()
        self._attributes.initialize({'show': 0, 'cluster': 1, 'cluster_type': 2,
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
139
                                  'start': 3, 'stop': 4},
Sylvain Meignier's avatar
Sylvain Meignier committed
140
                                    ['empty', 'empty', 'speaker', 0, 0])
141
        self.cluster_types = ['speaker', 'head']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
142
143
144
145
        self.segments = list()

    def copy_structure(self):
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
146
147
        Copy the internal structure of the diarization, ie the attribut names
        and the cluster types. The data is not copy.
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
148
149
        :return: a Diar object
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
150
151
152
153
        tmp_diarization = Diar()
        tmp_diarization._attributes = copy.deepcopy(self._attributes)
        tmp_diarization.cluster_types = copy.deepcopy(self.cluster_types)
        return tmp_diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
154

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
155
    def del_all(self, attribute, value):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
156
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
157
        Delete all segments satisfing the boolean expression [attribute = value]
Sylvain Meignier's avatar
Sylvain Meignier committed
158
        :param attribute: speaker of the attribute to delete
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
159
160
161
162
        :param value:
        :return:
        """
        lst = list()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
163
164
165
        for segment in self.segments:
            if segment[attribute] != value:
                lst.append(segment)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
166
167
        self.segments = lst

Sylvain Meignier's avatar
???    
Sylvain Meignier committed
168
    def overlap(self, add_intersection=False):
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
169
170
171
172
        """
        remove overlap zone
        :return: a new diarization without overlap
        """
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
173
174

        def add(_show, _features_index, _cluster_list, _uem):
Sylvain Meignier's avatar
Sylvain Meignier committed
175
            diar_tmp = self.copy_structure()
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
176
177
            for lcluster in _cluster_list:
                lst = sorted(_uem.intersection(set(_features_index[lcluster])))
Sylvain Meignier's avatar
Sylvain Meignier committed
178
179
                if len(lst) > 0:
                    c = lst[0];
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
180
                    diar_tmp.append(show=_show, start=c, stop=c+1, cluster=lcluster)
Sylvain Meignier's avatar
Sylvain Meignier committed
181
182
183
184
185
186
187
188
                    l = 0
                    for j in range(1, len(lst)):
                        p = c
                        c = lst[j]
                        if c == p + 1:
                            l += 1
                        else:
                            diar_tmp[-1]['stop'] += l
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
189
                            diar_tmp.append(show=_show, start=c, stop=c+1, cluster=lcluster)
Sylvain Meignier's avatar
Sylvain Meignier committed
190
191
192
                            l = 0
                    if l > 0:
                        diar_tmp[-1]['stop'] += l
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
            return diar_tmp

        diarization_out = self.copy_structure()
        #shows = self.unique('show')
        shows = self.make_index(['show'])
        for show in shows:
            logging.info('rm overlap show: '+show)
            diar_show = shows[show]
            length = diar_show.last_feature_index()
            cluster_list = diar_show.unique('cluster')
            mat = numpy.zeros(length)
            features_index = diar_show.features_by_cluster()
            for i, cluster in enumerate(cluster_list):
                mat[features_index[cluster]] += 1
            uem = set(numpy.where(mat == 1)[0].tolist())
            diarization_out.segments += add(show, features_index, cluster_list, uem)
            if add_intersection:
                uem = set(numpy.where(mat != 1)[0].tolist())
                diarization_out.segments += add(show, features_index, cluster_list, uem)

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
213
        return diarization_out
Sylvain Meignier's avatar
Sylvain Meignier committed
214

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
215
    def filter(self, attribute, operator, value):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
216
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
217
218
        build a new diarization whose segments satisfy the boolean expression
        [attribute operator value]
Sylvain Meignier's avatar
Sylvain Meignier committed
219
        :param attribute: a attribute speaker (str)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
220
        :param operator: a comperator opertor (> < >= <= in == !=)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
221
222
223
        :param value: the value (int, float, str, list...)
        :return: a Diar object
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
224
225
226
227
        tmp_diarization = self.copy_structure()
        tmp_diarization.segments = list()
        if attribute == "length" or attribute == "duration":
            str = "seg.duration() {:s} {}".format(operator, value)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
228
        elif isinstance(value, string_types):
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
229
            str = "seg['{:s}'] {:s} '{:s}'".format(attribute, operator, value)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
230
        else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
231
            str = "seg['{:s}'] {:s} {} ".format(attribute, operator, value)
Sylvain Meignier's avatar
Sylvain Meignier committed
232
        # print(ch)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
233
        logging.debug(str)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
234
        for seg in self.segments:
Sylvain Meignier's avatar
Sylvain Meignier committed
235
            # print(ch, seg.length())
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
236
237
238
            if eval(str):
                tmp_diarization.segments.append(copy.deepcopy(seg))
        return tmp_diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
239

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
240
    def rename(self, attribute, old_values, new_value):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
241
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
242
        Rename all values in list old_values into the new value new_value
Sylvain Meignier's avatar
Sylvain Meignier committed
243
        :param attribute: speaker of the attribute
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
244
        :param old_values:  list of old values
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
245
246
247
        :param new_value: new value

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
248
249
250
        for segment in self.segments:
            if segment[attribute] in old_values or len(old_values) == 0 :
                segment[attribute] = new_value
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
251

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
252
    def _iofi(self, index, attributes, segment):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
253
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
254
        recursive fonction to add a segment into the n level keys dictionary
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
255
256

        :param index: dict object of level n
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
257
258
259
260
        :param attributes: list of attribut attributes
        :param segment: a segment
        :return: a dictornary of level n that contains sub diarization. Segments
        are not copy.
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
261
        """
Sylvain Meignier's avatar
Sylvain Meignier committed
262
        # removes and gets the last attribut speaker
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
263
        attribut = attributes.pop()
Sylvain Meignier's avatar
Sylvain Meignier committed
264
        # takes the values of this attribut speaker
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
265
266
267
        value = segment[attribut]
        # if there is no more attribut attributes
        if len(attributes) <= 0:
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
268
            if value in index:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
269
270
                # add the segment to the list
                index[value].append_seg(segment)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
271
            else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
272
                # create a list and add the segment
Sylvain Meignier's avatar
Sylvain Meignier committed
273
                ldiar = self.copy_structure()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
274
                ldiar.append_seg(segment)
Sylvain Meignier's avatar
Sylvain Meignier committed
275
                index[value] = ldiar
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
276
277
            return index
        else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
278
279
            # recursion to the level n-1 until attributes is empty
            self._iofi(index[value], attributes, segment)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
280

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
281
    def make_index(self, attributes):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
282
283
284
285
286
287
288
289
        """
        Build a n level key dictionary (dictionary of dictionaries of
        dictionaries...) based on Index.
        Index is an implementation of perl's autovivification feature.
        The values contains a list of row.

        example :

290
            d = make_index(['show', 'gender', 'cluster'])
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
291

Sylvain Meignier's avatar
Sylvain Meignier committed
292
            print(d['show1']['M']['speaker'])
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
293

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
294
295
        :param attributes: a list of attribut _attributes corresponding to the key indexes
        :return: a dictionary of sub diarization. Segments are not copy.
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
296
297
        """
        index = Index()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
298
299
        for segment in self.segments:
            self._iofi(index, attributes[::-1], segment)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
300
301
        return index

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
302
    def unique(self, attibute):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
303
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
304
        :param attibute: the attibute of the attribut
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
305
306
        :return: a list object of unique value of the attribut
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
307
308
        dic = dict()
        lst = list()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
309
        for seg in self.segments:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
310
311
312
313
            dic[seg[attibute]] = 0
        for value in dic.keys():
            lst.append(value)
        return lst
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
314

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
315
    def sort(self, attributes=['show', 'start'], reverse=False):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
316
317
        """
        Sort the segments
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
318
        :param attributes: a list of attribut names
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
319
320
321
        :param reverse: if true, make a reverse sort

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
322
323
324
325
326
        attributes.reverse()
        for attribute in attributes:
            if attribute not in self._attributes:
                raise Exception("This attribut don't exits : " + attribute)
            self.segments = sorted(self.segments, key=lambda x: x[self._attributes[attribute]],
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
327
328
329
330
331
332
333
334
335
                                   reverse=reverse)

    def clear(self):
        """
        remove all the segments
        :return:
        """
        self.segments = list()

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
336
    def add_attribut(self, new_attribut, default=''):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
337
338
        """
        Add a attribut
Sylvain Meignier's avatar
Sylvain Meignier committed
339
        :param new_attribut: the speaker of the new attribut
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
340
341
342
        :param default: the default value of the attribut

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
343
        self._attributes.add(new_attribut, default)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
344
345
346
        for seg in self.segments:
            seg.append(default)

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
347
    def del_attribut(self, attribut):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
348
349
        """
        Delete a attribut
Sylvain Meignier's avatar
Sylvain Meignier committed
350
        :param attribut: the speaker of the attribut to detele
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
351
352

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
353
354
        if attribut not in self._attributes:
            raise Exception("This attribut don't exits : " + attribut)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
355
        else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
356
            i = self._attributes[attribut]
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
357
358
            for seg in self.segments:
                del seg[i]
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
359
            self._attributes.delete(attribut)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
360
361
362

    def _new_row(self, **kwargs):
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
363
        Create a new segment initialized with kwargs
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
364
365
366
        :param kwargs: the values
        :return:
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
367
        seg = Segment(self._attributes.defaults, self._attributes)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
368
        for key, value in kwargs.items():
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
369
            seg[self._attributes[key]] = value
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
370
371
372
373
        return seg

    def append(self, **kwargs):
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
374
375
        Transforme a list of values into a segment and append the segmnt into
        the existing segment list.
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
376
377
378
379
380
        :param kwargs: the values
        :return:
        """
        self.segments.append(self._new_row(**kwargs))

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
381
    def append_seg(self, segment):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
382
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
383
        Append a Segment object into the existing segment list.
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
384

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
385
        :param segment: a Segment object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
386
387

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
388
        self.segments.append(segment)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
389

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
390
    def append_list(self, segment_lst):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
391
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
392
393
        Append a list of segments into the existing segment segment_lst.
        :param segment_lst: a list of segments
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
394
395

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
396
397
398
        self.segments += segment_lst

    def append_diar(self, out_diarization):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
399
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
400
401
        Append a diarization.
        :param out_diarization: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
402
403

        """
404
405
406
407
408
409
410
411
412
413
414
415
416
417
        lMatchAttr=list()
        for i in self._attributes.names:
            if i in out_diarization._attributes.names:
                lMatchAttr.append(i)
        assert len(lMatchAttr)!=0,"No attribute matches"
        
        lSegments=list()
        for i in out_diarization:
            seg=Segment(self._attributes.defaults,self._attributes)
            for y in lMatchAttr:
                seg._set_attr(y,i[y])
            lSegments.append(seg)
        
        self.segments += lSegments
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473

    def insert(self, i, **kwargs):
        """
        Insert values into the list at offset index
        :param i: This is the Index where the object obj need to be inserted.
        :param kwargs: the values

        """
        self.segments.insert(i, self._new_row(**kwargs))

    def __iter__(self):
        """
        This method is called when an iterator is required for a container.
        :return: an iterator
        """
        return self.segments.__iter__()

    def __reversed__(self):
        """
        Called (if present) by the reversed() built-in to implement reverse iteration.
        :return: a Diar object
        """
        return self.segments.__reversed__()

    def __delitem__(self, index):
        """
        Called to implement deletion of self[index]
        :param index: a int

        """
        del self.segments[index]

    def __getitem__(self, index):
        """
        Called to implement evaluation of self[index]
        :param index: a int
        :return: a Segment object
        """
        return self.segments[index]

    def __setitem__(self, index, value):
        """
        Called to implement evaluation of self[index] = value

        :param index: a int
        :param value:  a Segment object

        """
        self.segments[index] = value

    def __len__(self):
        """
        :return: the number of segments
        """
        return len(self.segments)

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
474
    def __eq__(self, diarization): # real signature unknown
Sulfyderz's avatar
Sulfyderz committed
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
        idx_self = self.make_index(['show', 'cluster', 'start'])
        idx = diarization.make_index(['show', 'cluster', 'start'])

        for show in idx_self:
            for cluster in idx_self[show]:
                for start in idx_self[show][cluster]:
                    if show not in idx:
                        return False
                    elif cluster not in idx[show]:
                        return False
                    elif start not in idx[show][cluster]:
                        return False
                    else :
                        l1 = idx[show][cluster][start].segments
                        l2 = idx_self[show][cluster][start].segments
                        return all(x in l2 for x in l1)
Sylvain Meignier's avatar
Sylvain Meignier committed
491
492
        return True

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
493
    def __ne__(self, diarization): # real signature unknown
Sylvain Meignier's avatar
?    
Sylvain Meignier committed
494
        return not self.__eq__(diarization)
Sylvain Meignier's avatar
Sylvain Meignier committed
495

Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
496
497
    def __repr__(self):
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
498
        :return: a string version of the diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
499
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
500
501
502
        string = '  attribut definition  : ['
        index = 0
        lst = self._attributes.sorted()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
503
        #print(lst)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
504
505
        for attribute in lst:
            string += "'" + attribute[0] + "', "
Anthony Larcher's avatar
Anthony Larcher committed
506
        string = re.sub(', $', '', string) + ']\n'
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
507
        for segment in self.segments:
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
508
            line = ''
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
509
510
            for attribute in segment:
                line += attribute.__repr__() + ', '
Anthony Larcher's avatar
Anthony Larcher committed
511
            string += '  row ' + str(index) + ': [' + re.sub(', $', '',
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
512
                                                           line) + ']\n'
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
513
514
            index += 1
        return '[\n' + string + ']'
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
515

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
516
517
518
519
    def __add__(self, diarization):
        diarization_copy = copy.deepcopy(self)
        diarization_copy.segments += diarization.segments
        return diarization_copy
Sylvain Meignier's avatar
Sylvain Meignier committed
520

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
521
522
    def __iadd__(self, diarization):
        self.segments += diarization.segments
Sylvain Meignier's avatar
Sylvain Meignier committed
523
        return self
Sylvain Meignier's avatar
Sylvain Meignier committed
524

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
525
526
    def id_map(self, id_attribut='cluster', show_attribut='show',
               prefix_id_attrubut=None, suffix_show_attribut=None):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
527
528
        """
        Generate a IdMap object for the StatServer
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
529
530
531
532
533
534
        :param id_attribut: speaker id_attribut attribut
        :param show_attribut: show_attribut attribut
        :param prefix_id_attrubut: prefix string of id_attribut
        :param suffix_show_attribut: suffix string of id_attribut

        :param out_diarization: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
535
536
537
        :return: a IdMap object
        """
        id_map = IdMap()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
538
539
540
541
        id_map.leftids = numpy.empty(len(self.segments), dtype="|O")
        id_map.rightids = numpy.empty(len(self.segments), dtype="|O")
        id_map.start = numpy.empty(len(self.segments), dtype="|O")
        id_map.stop = numpy.empty(len(self.segments), dtype="|O")
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
542
543

        i = 0
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
544
545
546
        for segment in self.segments:
            if prefix_id_attrubut is not None:
                id_map.leftids[i] = segment[prefix_id_attrubut] + '/' + segment[id_attribut]
Sylvain Meignier's avatar
Sylvain Meignier committed
547
            else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
548
549
550
                id_map.leftids[i] = segment[id_attribut]
            if suffix_show_attribut is not None:
                id_map.rightids[i] = segment[show_attribut] + '/' + segment[suffix_show_attribut]
Sylvain Meignier's avatar
??    
Sylvain Meignier committed
551
            else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
552
553
554
                id_map.rightids[i] = segment[show_attribut]
            id_map.start[i] = segment['start']
            id_map.stop[i] = segment['stop']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
555
556
557
558
            i += 1

        return id_map

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
559
    def features_by_cluster(self, show=None, maximum_length=None):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
560
561
        """
        Generate the indexes of a show
Sylvain Meignier's avatar
Sylvain Meignier committed
562
        :param show: the speaker of the show
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
563
        :param maximum_length: maximum length of the show
564
        :return: a dict object (keys are the cluster_list)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
565
566
567
568
        """
        if show == None:
            l = self.unique('show')
            if len(l) > 1:
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
569
                raise Exception('diarization address sevreal shows, set show parameter')
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
570
571
            else:
                show = l[0]
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
        dic = dict()
        for segment in self.segments:
            if show == segment['show']:
                cluster = segment['cluster']
                start = segment['start']
                stop = segment['stop']
                if maximum_length is not None:
                    start = min(segment['start'], maximum_length)
                    stop = min(segment['stop'], maximum_length)

                if cluster not in dic:
                    dic[cluster] = []
                dic[cluster] += [i for i in range(start, stop)]
        return dic

    def features(self, show=None, maximum_length=None):
        """
        Generate the index features of a show
Sylvain Meignier's avatar
Sylvain Meignier committed
590
        :param show: a string corresponding to the speaker of the show
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
591
        :param maximum_length: maximum length of the show
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
592
593
        :return: a list object of indexes
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
594
595
596
        if show is None:
            lst = self.unique('show')
            if len(lst) > 1:
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
597
                raise Exception('diarization address sevreal shows, set show parameter')
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
598
            else:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
599
600
601
602
603
604
605
606
607
608
609
                show = lst[0]
        lst = list()
        for segment in self.segments:
            if show == segment['show']:
                start = segment['start']
                stop = segment['stop']
                if maximum_length is not None:
                    start = min(segment['start'], maximum_length)
                    stop = min(segment['stop'], maximum_length)
                lst += [i for i in range(start, stop)]
        return lst
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
    def to_list(self, show=None, uem_start=None, uem_stop=None):
        if show is None:
            lst = self.unique('show')
            if len(lst) > 1:
                raise Exception('diarization address sevreal shows, set show parameter')
            else:
                show = lst[0]
        if uem_start is None:
            uem_start = 0
        if uem_stop is None:
            uem_stop = self.last_feature_index()
        lst = [''] * uem_stop
        for segment in self.segments:
            if show == segment['show']:
                cluster = segment['cluster']
                start = max(segment['start'], uem_start)
                stop = min(segment['stop'], uem_stop)
                for i in range(start, stop):
                    if lst[i] == '':
                        lst[i] = cluster
                    else:
                        lst[i] += ' '+cluster
        return lst
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
633

634
    def pack(self, epsilon=0, coveringOverlap=False):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
635
636
637
        """
        merge segments with a gap less than epsilon
        :param epsilon: a int value
638
639
        :param coveringOverlap: a boolean value
        """
Sylvain Meignier's avatar
Sylvain Meignier committed
640
        
641
        #index = self.make_index(['show'])
Sylvain Meignier's avatar
Sylvain Meignier committed
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
        #lst = list()
        #for show in index:
        #    diar = index[show]
        #    diar.sort(['start'])
        #    i = 0
        #    while i < len(diar.segments) - 1:
        #        if diar.segments[i]['cluster'] == diar.segments[i + 1]['cluster']:
        #            l = Segment.gap(diar.segments[i], diar.segments[i + 1]).duration()
        #            if l <= epsilon:
        #                diar.segments[i]['stop'] = max(diar.segments[i]['stop'],
        #                                               diar.segments[i + 1]['stop'])
        #                del diar.segments[i + 1]
        #            else:
        #                i += 1
        #        else:
        #            i += 1
        #    lst += diar.segments
        #self.segments = lst
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
        if coveringOverlap:
            index = self.make_index(['show', 'cluster'])
            lst = list()
            for show in index:
                for cluster in index[show]:
                    index[show][cluster].sort(['start'])
                    diar = index[show][cluster]
                    i = 0
                    while i < len(diar.segments) - 1:
                        l = Segment.gap(diar.segments[i], diar.segments[i + 1]).duration()
                        if l <= epsilon:
                            diar.segments[i]['stop'] = max(diar.segments[i]['stop'],
                                                           diar.segments[i + 1]['stop'])
                            del diar.segments[i + 1]
                        else:
                            i += 1
                    lst += diar.segments
            self.segments = lst
        else:
            self.sort(['show', 'start'])
            i = 0
            while i < len(self.segments) - 1:
                if self.segments[i]['show'] == self.segments[i + 1]['show'] and \
                        self.segments[i]['cluster'] == self.segments[i + 1]['cluster'] and \
684
                        Segment.gap(self.segments[i], self.segments[i + 1]).duration() <= epsilon:
685
686
687
688
                    self.segments[i]['stop'] = self.segments[i + 1]['stop']
                    del self.segments[i + 1]
                else:
                    i += 1
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
689
690
691

    def pad(self, epsilon=0):
        """
Sylvain Meignier's avatar
merge    
Sylvain Meignier committed
692
        Add epsilon frames to the start and stop of each segment
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
693
694
695
696
697
698
        :param epsilon: the int value to remove
        :return:
        """
        self.sort(['start'])
        i = 0
        if len(self.segments) > 1:
699
            self.segments[i]['stop'] = max(self.segments[i]['start'], min(max(self.segments[i + 1]['start'] - (epsilon // 2), 0), self.segments[i]['stop'] + epsilon))
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
700
701
        i += 1
        while i < len(self.segments)-1:
Sylvain Meignier's avatar
merge    
Sylvain Meignier committed
702
            self.segments[i]['start'] = max(self.segments[i - 1]['stop'], self.segments[i]['start'] - epsilon, 0)
703
            self.segments[i]['stop'] = max(self.segments[i]['start'],min(max(self.segments[i + 1]['start'] - (epsilon // 2), 0), self.segments[i]['stop'] + epsilon))
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
704
705
706
707
708
709
710
711
712
713
714
            i += 1

    def collar(self, epsilon=0, warning=False):
        """
        Apply a collar on each segment. A collar is the no-score zone around
        reference speaker segment boundaries.  (Speaker Diarization output is
        not evaluated within +/- collar seconds of a reference speaker segment
        boundary.)
        :param epsilon: the int value to add
        """
        self.sort(['start'])
Sylvain Meignier's avatar
??    
Sylvain Meignier committed
715
        rm = False
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
716
717
718
719
720
721
722
        for segment in self.segments:
            segment['stop'] -= epsilon
            segment['start'] += epsilon
            if segment['start'] < 0:
                segment['start'] = 0
            if segment['start'] > segment['stop']:
                segment['start'] = segment['stop']
Sylvain Meignier's avatar
??    
Sylvain Meignier committed
723
                rm = True
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
724
                if warning:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
725
                    logging.warning('no more segment: '+str(segment['start']-epsilon))
Sylvain Meignier's avatar
??    
Sylvain Meignier committed
726
727
        if rm:
            self.segments = [seg for seg in self.segments if seg.duration() > 0]
Sylvain Meignier's avatar
Sylvain Meignier committed
728

Sylvain Meignier's avatar
Sylvain Meignier committed
729
    def duration(self):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
730
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
731
        :return: the sum of the segment duration
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
732
733
        """
        l = 0
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
734
735
        for segment in self.segments:
            l += segment.duration()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
736
        return l
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
737

Sylvain Meignier's avatar
Sylvain Meignier committed
738
739
    def last_feature_index(self):
        last = 0
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
740
741
742
        for segment in self.segments:
            if segment['stop'] > last:
                last = segment['stop']
Sylvain Meignier's avatar
Sylvain Meignier committed
743
        return last
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
744

Sylvain Meignier's avatar
???    
Sylvain Meignier committed
745
746
747
748
749
750
751
752
    def first_feature_index(self):
        if len(self.segments) <= 0:
            return 0
        first = self.segments[0]['start']
        for segment in self.segments:
            if segment['start'] < first:
                first = segment['start']
        return first
753
    
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
754
    @classmethod
755
    def read_seg(cls, filename, normalize_cluster=False, encoding="utf8"):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
756
757
758
        """
        Read a segmentation file
        :param filename: the str input filename
Sylvain Meignier's avatar
Sylvain Meignier committed
759
        :param normalize_cluster: normalize the cluster speaker by removing upper
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
760
761
        case and accents
        :return: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
762
        """
763
        fic = open(filename, 'r', encoding=encoding)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
764
765
766
767
768
769
770
        diarization = Diar()
        if not diarization._attributes.exist('gender'):
            diarization.add_attribut(new_attribut='gender', default='U')
        if not diarization._attributes.exist('env'):
            diarization.add_attribut(new_attribut='env', default='U')
        if not diarization._attributes.exist('channel'):
            diarization.add_attribut(new_attribut='channel', default='U')
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
771
        try:
Anthony Larcher's avatar
Anthony Larcher committed
772
773
            for line in fic: 
                line = re.sub('\s+',' ',line)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
774
                line = line.strip()
Sylvain Meignier's avatar
Sylvain Meignier committed
775
                # logging.debug(line)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
776
777
778
                if line.startswith('#') or line.startswith(';;'):
                    continue
                # split line into fields
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
779
780
781
                show, tmp, start, length, gender, channel, environment, name = line.split()
                if normalize_cluster:
                    name = str2str_normalize(name)
Sylvain Meignier's avatar
Sylvain Meignier committed
782
                # print(show, tmp, start, length, gender, channel, env, speaker)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
783
784
                diarization.append(show=show, cluster=name, start=int(start),
                             stop=int(length) + int(start), env=environment,
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
785
786
787
788
                             channel=channel,
                             gender=gender)
        except Exception as e:
            logging.error(sys.exc_info()[0])
Sylvain Meignier's avatar
Sylvain Meignier committed
789
            # logging.error(line)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
790
        fic.close()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
791
792
        return diarization

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
793
    @classmethod
794
    def read_ctm(cls, filename, normalize_cluster=False, encoding="utf8"):
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
795
796
797
        """
        Read a segmentation file
        :param filename: the str input filename
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
798
799
800
        :param normalize_cluster: normalize the cluster by removing upper case
        and accents
        :return: a diarization object
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
801
        """
802
        fic = open(filename, 'r', encoding=encoding)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
803
        diarization = Diar()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
804
805
        try:
            for line in fic:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
806
                line = re.sub('\s+',' ',line)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
807
                line = line.strip()
Sylvain Meignier's avatar
Sylvain Meignier committed
808
                # logging.debug(line)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
809
810
811
812
                if line.startswith('#') or line.startswith(';;'):
                    continue
                # split line into fields
                show, tmp, start, length, word = line.split()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
813
814
                if normalize_cluster:
                    word = str2str_normalize(word)
Sylvain Meignier's avatar
Sylvain Meignier committed
815
                # print(show, tmp, start, length, gender, channel, env, speaker)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
816
                diarization.append(show=show, cluster=word, start=int(start),
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
817
818
819
                             stop=int(length) + int(start))
        except Exception as e:
            logging.error(sys.exc_info()[0])
Sylvain Meignier's avatar
Sylvain Meignier committed
820
            # logging.error(line)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
821
        fic.close()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
822
        return diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
823
824

    @classmethod
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
    def read_stm(cls,filename, normalize_cluster=False, encoding="ISO-8859-1"):
        """
        Read a segmentation file
        :param filename: the str input filename
        :param normalize_cluster: normalize the cluster by removing upper case
        and accents
        :return: a diarization object
        """
        fic = open(filename, 'r', encoding=encoding)
        diarization = Diar()
        if not diarization._attributes.exist('gender'):
            diarization.add_attribut(new_attribut='gender', default='U')
        try:
            for line in fic:
                line = re.sub('\s+',' ',line)
                line = line.strip()
                # logging.debug(line)
                if line.startswith('#') or line.startswith(';;'):
                    continue
                # split line into fields
                split = line.split()
                show = split[0]
                loc = split[2]
                if normalize_cluster:
                    loc = str2str_normalize(loc)
                start = int(float(split[3])*100)
                stop = int(float(split[4])*100)
                addon = split[5].replace(">", "").replace("<", "").replace(","," ")
                lineBis = re.sub('\s+',' ',addon)
                lineBis = lineBis.strip()
                gender = lineBis.split()[2]
                if normalize_cluster:
                    word = str2str_normalize(word)
                # print(show, tmp, start, length, gender, channel, env, speaker)
                if gender == "female":
                    diarization.append(show=show, cluster=loc, start=start,
                             stop=stop,gender="F")
                elif gender == "male":
                    diarization.append(show=show, cluster=loc, start=start,
                             stop=stop,gender="M")
                else:
                    diarization.append(show=show, cluster=loc, start=start,
                             stop=stop)
        except Exception as e:
            logging.error(sys.exc_info()[0])
            logging.error(line)
        fic.close()
        return diarization

Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
874
    @classmethod
875
    def read_mdtm(cls, filename, normalize_cluster=False, encoding="utf8"):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
876
877
878
        """
        Read a MDTM file
        :param filename: the str input filename
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
879
880
881
        :param normalize_cluster: normalize the cluster by removing upper case
        and accents
        :return: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
882
883
        """

884
        fic = open(filename, 'r', encoding=encoding)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
885
886
887
        diarization = Diar()
        if not diarization._attributes.exist('gender'):
            diarization.add_attribut(new_attribut='gender', default='U')
Sylvain Meignier's avatar
merge    
Sylvain Meignier committed
888
889
890
891
892
893
894
895
896
897
898
899
900
901
        for line in fic:
            line = line.strip()
            line = re.sub('\s+',' ',line)
            logging.debug(line)
            if line.startswith('#') or line.startswith(';;'):
                continue
            # split line into fields
            show, tmp, start_str, length, t, score, gender, cluster = line.split()
            start = int(round(float(start_str)*100, 0))
            stop = start+int(round(float(length)*100, 0))
            if normalize_cluster:
                cluster = str2str_normalize(cluster)
            # print(show, tmp, start, length, gender, channel, env, speaker)
            diarization.append(show=show, cluster=cluster, start=start,
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
902
903
                             stop=stop, gender=gender)
        fic.close()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
904
        return diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
905
906

    @classmethod
907
    def read_uem(cls, filename, encoding="utf8"):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
908
909
910
        """
        Read a UEM file
        :param filename: the str input filename
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
911
        :return: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
912
        """
913
        fic = open(filename, 'r', encoding=encoding)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
914
915
916
        diarization = Diar()
        if not diarization._attributes.exist('gender'):
            diarization.add_attribut(new_attribut='gender', default='U')
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
917
918
919
        try:
            name = "uem"
            for line in fic:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
920
                line = re.sub('\s+',' ',line)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
921
                line = line.strip()
Sylvain Meignier's avatar
Sylvain Meignier committed
922
                # logging.debug(line)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
923
924
925
                if line.startswith('#') or line.startswith(';;'):
                    continue
                # split line into fields
Sylvain Meignier's avatar
Sylvain Meignier committed
926
                show, tmp, start_str, stop_str = line.split()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
927
                start = int(round(float(start_str)*100, 0))
Sylvain Meignier's avatar
Sylvain Meignier committed
928
                stop = int(round(float(stop_str)*100, 0))
Sylvain Meignier's avatar
Sylvain Meignier committed
929
                # stop = start+int(round(float(length)*100, 0))
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
930
                diarization.append(show=show, cluster=name, start=start, stop=stop)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
931
932
933
934
        except Exception as e:
            logging.error(sys.exc_info()[0])
            logging.error(line)
        fic.close()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
935
        return diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
936
937

    @classmethod
938
    def read_rttm(cls, filename, normalize_cluster=False, encoding="utf8"):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
939
940
941
        """
        Read rttm file
        :param filename: str input filename
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
942
943
        :param normalize_cluster: normalize the cluster by removing upper case and accents
        :return: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
944
        """
945
        fic = open(filename, 'r', encoding=encoding)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
946
947
948
        diarization = Diar()
        if not diarization._attributes.exist('gender'):
            diarization.add_attribut(new_attribut='gender', default='U')
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
949
950
        try:
            for line in fic:
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
951
                line = re.sub('\s+',' ',line)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
952
953
954
955
                line = line.strip()
                if line.startswith('#') or line.startswith(';;'):
                    continue
                # split line into fields
Anthony Larcher's avatar
Anthony Larcher committed
956
                spk, show, tmp0, start_str, length, tmp1, tmp2, cluster, tmp3 = line.split()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
957
958
959
                if spk == "SPEAKER":
                    start = int(round(float(start_str)*100, 0))
                    stop = start+int(round(float(length)*100, 0))
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
960
961
962
                    if normalize_cluster:
                        cluster = str2str_normalize(cluster)
                    diarization.append(show=show, cluster=cluster, start=start, stop=stop)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
963
964
965
966
        except Exception as e:
            logging.error(sys.exc_info()[0])
            logging.error(line)
        fic.close()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
967
        return diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
968
969
970
971

    @classmethod
    def to_string_seg(cls, diar):
        """
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
972
        transform a diarization into a string
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
973
974
        :param diar: a diarization
        :return: a string
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
975
976
        """
        lst = []
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
977
        for segment in diar:
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
978
            gender = 'U'
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
979
980
            if diar._attributes.exist('gender'):
                gender = segment['gender']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
981
            env = 'U'
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
982
983
            if diar._attributes.exist('env'):
                env = segment['env']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
984
            channel = 'U'
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
985
986
            if diar._attributes.exist('channel'):
                channel = segment['channel']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
987
            lst.append('{:s} 1 {:d} {:d} {:s} {:s} {:s} {:s}\n'.format(
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
988
989
                segment['show'], segment['start'], segment['stop'] - segment['start'], gender,
                channel, env, segment['cluster']))
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
990
        return lst
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
991

Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
992
    @classmethod
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
993
    def intersection(cls, diarization1, diarization2):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
994
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
995
996
997
998
        Compute the intersection between two diarization
        :param diarization1: first diarization
        :param diarization2: second diarization
        :return: a diarization object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
999
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1000
1001
1002
1003
        diarization = Diar()
        for segment1 in diarization1:
            for segment2 in diarization2:
                inter = Segment.intersection(segment1, segment2)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1004
                if inter is not None :
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1005
1006
                    diarization.append_seg(inter)
        return diarization
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1007
1008

    @classmethod
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1009
    def write_seg(cls, filename, diarization):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1010
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1011
        Write diarization to a segmentation file
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1012
        :param filename: the str output filename
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
1013
        :param diarization: the diarization to write
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1014
1015

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1016
        diarization.sort(['show', 'start'])
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1017
        fic = open(filename, 'w', encoding="utf8")
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1018
        for line in Diar.to_string_seg(diarization):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1019
1020
1021
1022
            fic.write(line)
        fic.close()

    @classmethod
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1023
    def write_lbl(cls, diarization, label_dir='', label_file_extension='.lbl'):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1024
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1025
1026
1027
1028
        Write diarization to label file
        :param diarization: the diarization to write
        :param label_dir: the string directory of the ouput filename
        :param label_file_extension: the string extension of the output filename
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1029
1030

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1031
        diarization.sort(['show', 'start'])
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1032
1033
        old_show = ''
        fic = None
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1034
1035
1036
        for segment in diarization:
            if old_show != segment['show']:
                if fic is not None:
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1037
                    fic.close()
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1038
                filename = os.path.join(label_dir, segment['show']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1039
1040
                                        + label_file_extension)
                fic = open(filename, 'w')
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1041
                old_show = segment['show']
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1042
            fic.write('{:d} {:d} {:s}\n'.format(
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1043
                segment['start'], segment['stop'], segment['cluster']))
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1044
1045
1046
1047
1048
1049
1050
        fic.close()

class Segment(list):
    """
    Class to store the segment informations.


Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1051
    :attr _attributes: is the list of attribut names
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1052
1053
    :attr data: the data associated to each attribut
    """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1054
    def __init__(self, data, attributes):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1055
1056
1057
        """
        Called after the instance has been created (by __new__()), but before it is returned to the caller.
        :param data: copy the row data
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1058
        :param attributes: the names of the attributs
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1059
1060
1061

        """
        list.__init__(self)
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1062
        self._attributes = attributes
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1063
1064
1065
1066
1067
1068
        for item in data:
            self.append(item)

    def _get_attr(self, attr_name):
        """
        Called to implement evaluation of self[attr_name].
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1069
        :param attr_name: a string
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1070
1071
        :return: the value
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1072
        return self[self._attributes[attr_name]]
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1073
1074
1075
1076
1077
1078
1079
1080
1081

    def _set_attr(self, attr_name, value):
        """
        Called to implement assignment to self[attr_name].

        :param attr_name: a str
        :param value: the value to set

        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1082
        self[self._attributes[attr_name]] = value
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107

    def __getitem__(self, index):
        """
        Called to implement evaluation of self[index].
        :param index: a int
        :return: the value
        """

        if isinstance(index, str):
            return self._get_attr(index)
        else:
            return list.__getitem__(self, index)

    def __setitem__(self, index, value):
        """
        Called to implement assignment to self[index].
        :param index: a int
        :param value: the value to set
        :return: the item
        """
        if isinstance(index, str):
            return self._set_attr(index, value)
        else:
            return list.__setitem__(self, index, value)

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1108
1109
1110
    def __eq__(self, segment): # real signature unknown
        if segment is not None:
            l = len(segment)
Sylvain Meignier's avatar
Sylvain Meignier committed
1111
            if l != len(self):
Sylvain Meignier's avatar
Sylvain Meignier committed
1112
                return False
Sylvain Meignier's avatar
Sylvain Meignier committed
1113
1114

            for i in range(l):
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1115
                if self[i] != segment[i]:
Sylvain Meignier's avatar
Sylvain Meignier committed
1116
1117
1118
1119
                    return False
            return True
        else:
            return False
Sylvain Meignier's avatar
Sylvain Meignier committed
1120

Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1121
1122
    def __ne__(self, segment): # real signature unknown
        return not self.__eq__(segment)
Sylvain Meignier's avatar
Sylvain Meignier committed
1123
1124

    def duration(self):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1125
        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1126
        :return: the duration of the segment
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1127
        """
Sylvain Meignier's avatar
aaa    
Sylvain Meignier committed
1128

Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1129
1130
1131
1132
1133
        return self['stop'] - self['start']

    def seg_features(self, features):
        """
        Given a FeatureServer, returns a list of feature index corresponding to
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1134
        the segment.
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1135
1136
1137
1138
1139
1140
1141

        :param features: a FeatureServer
        :return: a list of int
        """
        return features[self['start']:self['stop'], :]

    @classmethod
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1142
    def gap(cls, segment1, segment2):
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1143
1144
        """
        Returns the inter segment gap between 2 segments.
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1145
1146
        :param segment1: a Segment object
        :param segment2: a Segment object
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1147
1148
1149
1150
        :return: a Segment object

        Examples
        --------
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
1151
1152
1153
1154
1155
        >>> from s4d.diarization import Diar, Segment
        >>> diarization=Diar()
        >>> diarization.append(show='empty', start=0, stop=100, cluster='spk1')
        >>> diarization.append(show='empty', start=50, stop=150, cluster='spk2')
        >>> s = Segment.intersection(diarization[0], diarization[1])
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1156
1157
        >>> s
        ['empty', 'spk1', 'speaker', 100, 50]
Sylvain Meignier's avatar
Sylvain Meignier committed
1158
        >>> s.duration()
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1159
        - 50
Sylvain Meignier's avatar
stable    
Sylvain Meignier committed
1160
1161
        >>> diarization.append(show='empty', start=200, stop=250, cluster='spk1')
        >>> Segment.gap(diarization[0], diarization[2])
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1162
1163
1164
1165
        ['empty', 'spk1', 'speaker', 100, 200]


        """
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1166
        if segment1['show'] != segment2['show']:
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1167
            raise Exception('not the same show')
Sylvain Meignier's avatar
new    
Sylvain Meignier committed
1168
1169
1170
1171
        segment = Segment(segment1, segment1._attributes)
        segment['start'] = segment1['stop']
        segment['stop'] = segment2['start']
        return segment
Sylvain Meignier's avatar
???    
Sylvain Meignier committed
1172
1173
1174
1175
    @classmethod
    def split(cls, segment1, segment2):
        inter = cls.intersection(segment1, segment2)
        diff = cls.diff(segment1, segment2)
Sylvain Meignier's avatar
Origin  
Sylvain Meignier committed
1176
1177

    @classmethod
Sylvain Meignier's avatar