Calculate DER for specified speaker identification mapping #69

swm35 · 2024-09-11T15:27:51Z

Description

Is there a way to use the pyannote.metrics DiarizationErrorRate() to calculate errors based on specifically identified speakers rather than using either the in-built Hungarian optimal mapping or greedy mapping?

Most speaker diarization systems distinguish speakers with generic labels such as SPEAKER_00, but if we have a speaker identification system on top it would be great to see how the DER is affected.

Example

Reference RTTM file:
SPEAKER AUDIOFILE1 1 1 5 ANN
SPEAKER AUDIOFILE1 1 6 3 BOB
SPEAKER AUDIOFILE1 1 8 2 ANN

Hypothesis RTTM file 1 from a speaker diarization system:
SPEAKER AUDIOFILE1 1 1 5 SPEAKER_00
SPEAKER AUDIOFILE1 1 6 3 SPEAKER_01
SPEAKER AUDIOFILE1 1 8 2 SPEAKER_00

DER for hypothesis RTTM file 1 is 20%, comprising 10% miss and 10% false alarm.

Hypothesis RTTM file 2 from speaker diarization followed by speaker identification:
SPEAKER AUDIOFILE1 1 1 5 BOB
SPEAKER AUDIOFILE1 1 6 3 ANN
SPEAKER AUDIOFILE1 1 8 2 BOB

DER for hypothesis RTTM file 2 is still 20%, comprising 10% miss and 10% false alarm. It does not factor in the speaker identification error from the wrongly identified speakers.

Apologies if I am asking something obvious. I feel there must be an easy answer out there but I have not found it. I am aware of speaker-attributed word error rates (SAWER) and its variants, but am not aware of any speaker-attributed DER metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate DER for specified speaker identification mapping #69

Calculate DER for specified speaker identification mapping #69

swm35 commented Sep 11, 2024

Calculate DER for specified speaker identification mapping #69

Calculate DER for specified speaker identification mapping #69

Comments

swm35 commented Sep 11, 2024

Description

Example