Finetuned large-v3 inference problem #1099

sinisha · 2024-10-28T13:46:20Z

I am using whisperx for inference (which is built upon faster-whisper).

I have finetuned large-v3 model on 1k hours of domain-specific data. When I run standard inference the results are ok. Finetuned model is converted using ctranslate2 but the results obtained with whisperx are almost all hallucinations and repetitions (maybe first couple of phonemes at the beginning are correct). I used same ctranslate2 command to convert original large-v3 model and whisperx inference is also correct. Model is finetuned using Transformers 4.45.2. I have tried a couple of different Transformers version in inference and the results are similar. Has anyone encountered similar problem?

The problem with hallucinations and repetitions is not sporadic. It happens with every input audio. Hence I am convinced it is some problem in generation tokens.

MahmoudAshraf97 · 2024-10-28T14:52:07Z

does these problems occur only with faster whisper or transformers also has this issue?

sinisha · 2024-10-28T22:28:10Z

does these problems occur only with faster whisper or transformers also has this issue?

There is no problem in standard inference. The output is without repetitions

bchinnari · 2024-10-29T07:27:56Z

hi @sinisha ... I think I also have similar problem. I also have fine-tuned a (small) model, I converted it to ct2 format and I am trying to use it as part of whisper streaming (https://github.com/ufal/whisper_streaming , this repo also uses faster whisper as backend).

I found that if the model doesn't output end timestamp token, faster whisper is giving some problems. Could you try the following code.

from faster_whisper import WhisperModel
modelPath = "/path/to/your/ct2/model"
model = WhisperModel(modelPath, device="cpu", compute_type="int8")

segments1 , info = model.transcribe(wav, task="transcribe",beam_size=5, word_timestamps=True)  #### line 1
for segment in segments1:  
      print(segment)

segments2 , info = model.transcribe(wav, task="transcribe",beam_size=5)  #### line 2
for segment in segments2:  
      print(segment)

Essentially between line 1 and 2 , there is only one difference. that is word_timestamps.
If these 2 lines produce different number of segments, we may conclude that "not outputting end time stamp token" could be the problem.
I think faster-whisper gives problems when "we have to work with word/segment time stamps and model doesn't give that end time stamp token"

sinisha · 2024-10-29T10:03:47Z

Hi @bchinnari Thanks for suggestion. Adding word_timestamps = True really improves everything a lot.
I am still confused about this. I've finetuned models earlier and used them with whisperx with no problems.

bchinnari · 2024-10-29T10:13:04Z

ok.. That's wonderful. Just re-checking with you adding True flag improves everything for you ?
It is making everything worse for me. I am observing the opposite of what you are saying :)

sinisha · 2024-10-29T10:39:32Z

ok.. That's wonderful. Just re-checking with you adding True flag improves everything for you ? It is making everything worse for me. I am observing the opposite of what you are saying :)

Oh. Wait. Setting word_timestamps = True returns more segments, whereas without word_timestamps there is always one segment. And it looks the results looked ok, only for some specific run. For every run they are different

bchinnari · 2024-10-29T10:47:18Z

Overall, True flag makes it worse ?

sinisha · 2024-10-29T12:36:10Z

Overall, True flag makes it worse ?

Can't answer this, since without 'True' flag I got only one short segment

bchinnari · 2024-10-30T05:44:22Z

I did not understand.
If you run on multiple audio files, adding the flag makes it worse overall I believe.
Could you check running on multiple audio files and let me know. This helps in progressing my work also further.

sinisha · 2024-10-31T20:31:46Z

I did not understand. If you run on multiple audio files, adding the flag makes it worse overall I believe. Could you check running on multiple audio files and let me know. This helps in progressing my work also further.

It gets worse, but suddenly for some file it improves a bit

asr-lord · 2024-11-02T10:37:14Z

I've the same issue with my own fine-tuned model when I use it with faster-whisper: #987

bchinnari · 2024-11-04T02:41:07Z

@asr-lord .. are you using faster whisper directly for transcription or are you using some other repo (like whisperX, whisper-streaming) which uses faster-whisper in the backend.

bchinnari · 2024-11-04T02:48:12Z

@asr-lord ..i saw your issue mentioned in #987 .
Like I said in my above previous comments, "word_timestamps=True" causing issues I believe. Try your example with "word_timestamps=False" and let us know.

asr-lord · 2024-11-07T12:00:37Z

@asr-lord .. are you using faster whisper directly for transcription or are you using some other repo (like whisperX, whisper-streaming) which uses faster-whisper in the backend.

I'm using directly faster-whisper v1.0.3

@asr-lord ..i saw your issue mentioned in #987 . Like I said in my above previous comments, "word_timestamps=True" causing issues I believe. Try your example with "word_timestamps=False" and let us know.

If I need word timestamps, this not could be the solution for me...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuned large-v3 inference problem #1099

Finetuned large-v3 inference problem #1099

sinisha commented Oct 28, 2024

MahmoudAshraf97 commented Oct 28, 2024

sinisha commented Oct 28, 2024

bchinnari commented Oct 29, 2024 •

edited

Loading

sinisha commented Oct 29, 2024

bchinnari commented Oct 29, 2024

sinisha commented Oct 29, 2024

bchinnari commented Oct 29, 2024

sinisha commented Oct 29, 2024

bchinnari commented Oct 30, 2024

sinisha commented Oct 31, 2024

asr-lord commented Nov 2, 2024

bchinnari commented Nov 4, 2024 •

edited

Loading

bchinnari commented Nov 4, 2024

asr-lord commented Nov 7, 2024

Finetuned large-v3 inference problem #1099

Finetuned large-v3 inference problem #1099

Comments

sinisha commented Oct 28, 2024

MahmoudAshraf97 commented Oct 28, 2024

sinisha commented Oct 28, 2024

bchinnari commented Oct 29, 2024 • edited Loading

sinisha commented Oct 29, 2024

bchinnari commented Oct 29, 2024

sinisha commented Oct 29, 2024

bchinnari commented Oct 29, 2024

sinisha commented Oct 29, 2024

bchinnari commented Oct 30, 2024

sinisha commented Oct 31, 2024

asr-lord commented Nov 2, 2024

bchinnari commented Nov 4, 2024 • edited Loading

bchinnari commented Nov 4, 2024

asr-lord commented Nov 7, 2024

bchinnari commented Oct 29, 2024 •

edited

Loading

bchinnari commented Nov 4, 2024 •

edited

Loading