-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuned large-v3 inference problem #1099
Comments
does these problems occur only with faster whisper or transformers also has this issue? |
There is no problem in standard inference. The output is without repetitions |
hi @sinisha ... I think I also have similar problem. I also have fine-tuned a (small) model, I converted it to ct2 format and I am trying to use it as part of whisper streaming (https://github.com/ufal/whisper_streaming , this repo also uses faster whisper as backend). I found that if the model doesn't output end timestamp token, faster whisper is giving some problems. Could you try the following code.
Essentially between line 1 and 2 , there is only one difference. that is word_timestamps. |
Hi @bchinnari Thanks for suggestion. Adding |
ok.. That's wonderful. Just re-checking with you adding True flag improves everything for you ? |
Oh. Wait. Setting |
Overall, True flag makes it worse ? |
Can't answer this, since without 'True' flag I got only one short segment |
I did not understand. |
It gets worse, but suddenly for some file it improves a bit |
I've the same issue with my own fine-tuned model when I use it with faster-whisper: #987 |
@asr-lord .. are you using faster whisper directly for transcription or are you using some other repo (like whisperX, whisper-streaming) which uses faster-whisper in the backend. |
I'm using directly faster-whisper v1.0.3
If I need word timestamps, this not could be the solution for me... |
I am using whisperx for inference (which is built upon faster-whisper).
I have finetuned large-v3 model on 1k hours of domain-specific data. When I run standard inference the results are ok. Finetuned model is converted using ctranslate2 but the results obtained with whisperx are almost all hallucinations and repetitions (maybe first couple of phonemes at the beginning are correct). I used same ctranslate2 command to convert original large-v3 model and whisperx inference is also correct. Model is finetuned using Transformers 4.45.2. I have tried a couple of different Transformers version in inference and the results are similar. Has anyone encountered similar problem?
The problem with hallucinations and repetitions is not sporadic. It happens with every input audio. Hence I am convinced it is some problem in generation tokens.
The text was updated successfully, but these errors were encountered: