IndexError: list index out of range in add_word_timestamps function #1118

formater · 2024-11-06T11:27:08Z

Hi,
I found a rare condition, with a specific wav file, specific language and prompt, when I try to transcribe with word_timestamps=True, there is a list index out of range error in add_word_timestamps function:

  File "/usr/local/src/transcriber/lib/python3.11/site-packages/faster_whisper/transcribe.py", line 1574, in add_word_timestamps
    median_duration, max_duration = median_max_durations[segment_idx]
                                    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
IndexError: list index out of range

It seems in the median_max_durations list we have less elements than in the segments list.

I'm using large-v3-turbo model with these transcibe settings:

segments, _ = asr_model.transcribe(audio_to_analize, language="fr", condition_on_previous_text=False, initial_prompt="Free", task='transcribe', word_timestamps=True, suppress_tokens=[-1, 12], beam_size=5) 
segments = list(segments)  # The transcription will actually run here.

As I see, the median_max_durations is populated from alignments, so something is maybe wrong there? If i change language or prompt, or use another sound file, then there is no issue.

Thank you

The text was updated successfully, but these errors were encountered:

MahmoudAshraf97 · 2024-11-06T11:34:17Z

I'm aware that this error exists but I had no luck in reproducing it, can you write the exact steps to reproduce and upload the audio file?

formater · 2024-11-06T11:49:50Z

Yes. The sample python code that generates the issue:

import torch
from faster_whisper import WhisperModel

asr_model = WhisperModel("large-v3-turbo", device="cuda", compute_type="int8", download_root="./models")
segments, _ = asr_model.transcribe('test.wav',  language='fr', condition_on_previous_text=False, initial_prompt='Free', task='transcribe', word_timestamps=True, suppress_tokens=[-1, 12], beam_size=5)
segments = list(segments)  # The transcription will actually run here.

And the audio sample is attached.
test.zip

MahmoudAshraf97 · 2024-11-06T16:02:06Z

I was not able to reproduce it on my machine or using colab

formater · 2024-11-06T16:30:05Z

Maybe python version, debian, pytorch... or something is slightly different on our setups. Anything I can do on my side to get more debug logs to see what is the issue?

MahmoudAshraf97 · 2024-11-06T17:38:10Z

are you using the master branch?
median_max_durations is initialized as an empty list, and since you are using sequential transcription, it will have a single value, The only reason that causes this error is that it is still an empty list which means the for loop in line 1565 was never executed, this will happen when alignments is an empty list, you need to figure why is this happening

faster-whisper/faster_whisper/transcribe.py

Lines 1561 to 1595 in 203dddb

    
           alignments = self.find_alignment( 
        
               tokenizer, text_tokens, encoder_output, num_frames 
        
           ) 
        
           median_max_durations = [] 
        
           for alignment in alignments: 
        
               word_durations = np.array( 
        
                   [word["end"] - word["start"] for word in alignment] 
        
               ) 
        
               word_durations = word_durations[word_durations.nonzero()] 
        
               median_duration = ( 
        
                   np.median(word_durations) if len(word_durations) > 0 else 0.0 
        
               ) 
        
               median_duration = min(0.7, float(median_duration)) 
        
               max_duration = median_duration * 2 
        
               # hack: truncate long words at sentence boundaries. 
        
               # a better segmentation algorithm based on VAD should be able to replace this. 
        
               if len(word_durations) > 0: 
        
                   sentence_end_marks = ".。!！?？" 
        
                   # ensure words at sentence boundaries 
        
                   # are not longer than twice the median word duration. 
        
                   for i in range(1, len(alignment)): 
        
                       if alignment[i]["end"] - alignment[i]["start"] > max_duration: 
        
                           if alignment[i]["word"] in sentence_end_marks: 
        
                               alignment[i]["end"] = alignment[i]["start"] + max_duration 
        
                           elif alignment[i - 1]["word"] in sentence_end_marks: 
        
                               alignment[i]["start"] = alignment[i]["end"] - max_duration 
        
               merge_punctuations(alignment, prepend_punctuations, append_punctuations) 
        
               median_max_durations.append((median_duration, max_duration)) 
        
           for segment_idx, segment in enumerate(segments): 
        
               word_index = 0 
        
               time_offset = segment[0]["start"] 
        
               median_duration, max_duration = median_max_durations[segment_idx]

krmao · 2024-11-14T13:05:01Z

the same here, while test whisper_streaming

Traceback (most recent call last):
  File "C:\Users\kr.mao\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\kr.mao\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online_server.py", line 183, in <module>
    proc.process()
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online_server.py", line 162, in process
    o = online.process_iter()
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online.py", line 378, in process_iter
    res = self.asr.transcribe(self.audio_buffer, init_prompt=prompt)
  File "F:\Workspace\skills\python3\whisper_streaming\whisper_online.py", line 138, in transcribe
    return list(segments)
  File "F:\Workspace\skills\python3\whisper_streaming\venv\lib\site-packages\faster_whisper\transcribe.py", line 2016, in restore_speech_timestamps
    for segment in segments:
  File "F:\Workspace\skills\python3\whisper_streaming\venv\lib\site-packages\faster_whisper\transcribe.py", line 1256, in generate_segments
    self.add_word_timestamps(
  File "F:\Workspace\skills\python3\whisper_streaming\venv\lib\site-packages\faster_whisper\transcribe.py", line 1595, in add_word_timestamps
    median_duration, max_duration = median_max_durations[segment_idx]
IndexError: list index out of range

faster_whisper version.py

"""Version information."""

__version__ = "1.1.0rc0"

MahmoudAshraf97 · 2024-11-14T13:23:13Z

This problem is still non-reproducible regardless of all methods provided, it will not be solved without reproduction, someone who has the problem needs to create a colab notebook to reproduce it and if they weren't able to reproduce it on colab then they need to isolate where the problem is caused in their environment, without that there is nothing that can be done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range in add_word_timestamps function #1118

IndexError: list index out of range in add_word_timestamps function #1118

formater commented Nov 6, 2024

MahmoudAshraf97 commented Nov 6, 2024

formater commented Nov 6, 2024

MahmoudAshraf97 commented Nov 6, 2024

formater commented Nov 6, 2024

MahmoudAshraf97 commented Nov 6, 2024

krmao commented Nov 14, 2024 •

edited

Loading

MahmoudAshraf97 commented Nov 14, 2024

IndexError: list index out of range in add_word_timestamps function #1118

IndexError: list index out of range in add_word_timestamps function #1118

Comments

formater commented Nov 6, 2024

MahmoudAshraf97 commented Nov 6, 2024

formater commented Nov 6, 2024

MahmoudAshraf97 commented Nov 6, 2024

formater commented Nov 6, 2024

MahmoudAshraf97 commented Nov 6, 2024

krmao commented Nov 14, 2024 • edited Loading

MahmoudAshraf97 commented Nov 14, 2024

krmao commented Nov 14, 2024 •

edited

Loading