Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range in add_word_timestamps function #1118

Open
formater opened this issue Nov 6, 2024 · 7 comments
Open

Comments

@formater
Copy link

formater commented Nov 6, 2024

Hi,
I found a rare condition, with a specific wav file, specific language and prompt, when I try to transcribe with word_timestamps=True, there is a list index out of range error in add_word_timestamps function:

  File "/usr/local/src/transcriber/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 1574, in add_word_timestamps
    median_duration, max_duration = median_max_durations[segment_idx]
                                    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
IndexError: list index out of range

It seems in the median_max_durations list we have less elements than in the segments list.

I'm using large-v3-turbo model with these transcibe settings:

segments, _ = asr_model.transcribe(audio_to_analize, language="fr", condition_on_previous_text=False, initial_prompt="Free", task='transcribe', word_timestamps=True, suppress_tokens=[-1, 12], beam_size=5) 
segments = list(segments)  # The transcription will actually run here.

As I see, the median_max_durations is populated from alignments, so something is maybe wrong there? If i change language or prompt, or use another sound file, then there is no issue.

Thank you

@MahmoudAshraf97
Copy link
Collaborator

I'm aware that this error exists but I had no luck in reproducing it, can you write the exact steps to reproduce and upload the audio file?

@formater
Copy link
Author

formater commented Nov 6, 2024

Yes. The sample python code that generates the issue:

import torch
from faster_whisper import WhisperModel

asr_model = WhisperModel("large-v3-turbo", device="cuda", compute_type="int8", download_root="./models")
segments, _ = asr_model.transcribe('test.wav',  language='fr', condition_on_previous_text=False, initial_prompt='Free', task='transcribe', word_timestamps=True, suppress_tokens=[-1, 12], beam_size=5)
segments = list(segments)  # The transcription will actually run here.

And the audio sample is attached.
test.zip

@MahmoudAshraf97
Copy link
Collaborator

I was not able to reproduce it on my machine or using colab

@formater
Copy link
Author

formater commented Nov 6, 2024

Maybe python version, debian, pytorch... or something is slightly different on our setups. Anything I can do on my side to get more debug logs to see what is the issue?

@MahmoudAshraf97
Copy link
Collaborator

are you using the master branch?
median_max_durations is initialized as an empty list, and since you are using sequential transcription, it will have a single value, The only reason that causes this error is that it is still an empty list which means the for loop in line 1565 was never executed, this will happen when alignments is an empty list, you need to figure why is this happening

alignments = self.find_alignment(
tokenizer, text_tokens, encoder_output, num_frames
)
median_max_durations = []
for alignment in alignments:
word_durations = np.array(
[word["end"] - word["start"] for word in alignment]
)
word_durations = word_durations[word_durations.nonzero()]
median_duration = (
np.median(word_durations) if len(word_durations) > 0 else 0.0
)
median_duration = min(0.7, float(median_duration))
max_duration = median_duration * 2
# hack: truncate long words at sentence boundaries.
# a better segmentation algorithm based on VAD should be able to replace this.
if len(word_durations) > 0:
sentence_end_marks = ".。!!??"
# ensure words at sentence boundaries
# are not longer than twice the median word duration.
for i in range(1, len(alignment)):
if alignment[i]["end"] - alignment[i]["start"] > max_duration:
if alignment[i]["word"] in sentence_end_marks:
alignment[i]["end"] = alignment[i]["start"] + max_duration
elif alignment[i - 1]["word"] in sentence_end_marks:
alignment[i]["start"] = alignment[i]["end"] - max_duration
merge_punctuations(alignment, prepend_punctuations, append_punctuations)
median_max_durations.append((median_duration, max_duration))
for segment_idx, segment in enumerate(segments):
word_index = 0
time_offset = segment[0]["start"]
median_duration, max_duration = median_max_durations[segment_idx]

@krmao
Copy link

krmao commented Nov 14, 2024

the same here, while test whisper_streaming

Traceback (most recent call last):
  File "C:\Users\kr.mao\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\kr.mao\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online_server.py", line 183, in <module>
    proc.process()
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online_server.py", line 162, in process
    o = online.process_iter()
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online.py", line 378, in process_iter
    res = self.asr.transcribe(self.audio_buffer, init_prompt=prompt)
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online.py", line 138, in transcribe
    return list(segments)
  File "F:\Workspace\skills\python3\whisper_streaming\venv\lib\site-packages\faster_whisper\transcribe.py", line 2016, in restore_speech_timestamps
    for segment in segments:
  File "F:\Workspace\skills\python3\whisper_streaming\venv\lib\site-packages\faster_whisper\transcribe.py", line 1256, in generate_segments
    self.add_word_timestamps(
  File "F:\Workspace\skills\python3\whisper_streaming\venv\lib\site-packages\faster_whisper\transcribe.py", line 1595, in add_word_timestamps
    median_duration, max_duration = median_max_durations[segment_idx]
IndexError: list index out of range

faster_whisper version.py

"""Version information."""

__version__ = "1.1.0rc0"

@MahmoudAshraf97
Copy link
Collaborator

This problem is still non-reproducible regardless of all methods provided, it will not be solved without reproduction, someone who has the problem needs to create a colab notebook to reproduce it and if they weren't able to reproduce it on colab then they need to isolate where the problem is caused in their environment, without that there is nothing that can be done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants