You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to use the pyannote.metrics DiarizationErrorRate() to calculate errors based on specifically identified speakers rather than using either the in-built Hungarian optimal mapping or greedy mapping?
Most speaker diarization systems distinguish speakers with generic labels such as SPEAKER_00, but if we have a speaker identification system on top it would be great to see how the DER is affected.
Example
Reference RTTM file:
SPEAKER AUDIOFILE1 1 1 5 ANN
SPEAKER AUDIOFILE1 1 6 3 BOB
SPEAKER AUDIOFILE1 1 8 2 ANN
DER for hypothesis RTTM file 1 is 20%, comprising 10% miss and 10% false alarm.
Hypothesis RTTM file 2 from speaker diarization followed by speaker identification:
SPEAKER AUDIOFILE1 1 1 5 BOB
SPEAKER AUDIOFILE1 1 6 3 ANN
SPEAKER AUDIOFILE1 1 8 2 BOB
DER for hypothesis RTTM file 2 is still 20%, comprising 10% miss and 10% false alarm. It does not factor in the speaker identification error from the wrongly identified speakers.
Apologies if I am asking something obvious. I feel there must be an easy answer out there but I have not found it. I am aware of speaker-attributed word error rates (SAWER) and its variants, but am not aware of any speaker-attributed DER metrics.
The text was updated successfully, but these errors were encountered:
Description
Is there a way to use the pyannote.metrics DiarizationErrorRate() to calculate errors based on specifically identified speakers rather than using either the in-built Hungarian optimal mapping or greedy mapping?
Most speaker diarization systems distinguish speakers with generic labels such as SPEAKER_00, but if we have a speaker identification system on top it would be great to see how the DER is affected.
Example
Reference RTTM file:
SPEAKER AUDIOFILE1 1 1 5 ANN
SPEAKER AUDIOFILE1 1 6 3 BOB
SPEAKER AUDIOFILE1 1 8 2 ANN
Hypothesis RTTM file 1 from a speaker diarization system:
SPEAKER AUDIOFILE1 1 1 5 SPEAKER_00
SPEAKER AUDIOFILE1 1 6 3 SPEAKER_01
SPEAKER AUDIOFILE1 1 8 2 SPEAKER_00
DER for hypothesis RTTM file 1 is 20%, comprising 10% miss and 10% false alarm.
Hypothesis RTTM file 2 from speaker diarization followed by speaker identification:
SPEAKER AUDIOFILE1 1 1 5 BOB
SPEAKER AUDIOFILE1 1 6 3 ANN
SPEAKER AUDIOFILE1 1 8 2 BOB
DER for hypothesis RTTM file 2 is still 20%, comprising 10% miss and 10% false alarm. It does not factor in the speaker identification error from the wrongly identified speakers.
Apologies if I am asking something obvious. I feel there must be an easy answer out there but I have not found it. I am aware of speaker-attributed word error rates (SAWER) and its variants, but am not aware of any speaker-attributed DER metrics.
The text was updated successfully, but these errors were encountered: