Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in subtitle generation - IndexError: list index out of range #282

Closed
nebehr opened this issue Jul 17, 2024 · 21 comments
Closed

Crash in subtitle generation - IndexError: list index out of range #282

nebehr opened this issue Jul 17, 2024 · 21 comments

Comments

@nebehr
Copy link

nebehr commented Jul 17, 2024

Program (r192.3.4) crashes at the end of execution, but before generating a subtitle file on some videos with tiny model, but usually exits correctly with other models on the same video (it may not be directly related to the model used, just the fact that its output has or doesn't have some offending attribute).

I think this is different to the crashes that may happen at the end of processing, also reported in the original faster-whisper.

Traceback (most recent call last):
  File "D:\whisper-fast\_XXL\__main__.py", line 1668, in <module>
  File "D:\whisper-fast\_XXL\__main__.py", line 1652, in cli
  File "D:\whisper-fast\_XXL\__main__.py", line 320, in __call__
  File "D:\whisper-fast\_XXL\__main__.py", line 859, in write_result
  File "D:\whisper-fast\_XXL\__main__.py", line 802, in iterate_result_alt
  File "D:\whisper-fast\_XXL\__main__.py", line 785, in iterate_subtitles_alt
IndexError: list index out of range
[11796] Failed to execute script '__main__' due to unhandled exception!
@Purfview
Copy link
Owner

Purfview commented Jul 17, 2024

That is when used with "--highlight_words"?
Can you repeatedly reproduce it on some file?

Share whole command used.

@nebehr
Copy link
Author

nebehr commented Jul 17, 2024

No, the only command args I use are --model, --language and file name. And yes, it is consistently reproducible on the file I use.

faster-whisper-xxl.exe --model tiny -l is 101.avi

In fact, I see that some characters in the console output look like question marks (copied here as 很 or ル), which obviously do not occur in the audio and cannot occur in the selected language. Perhaps they break something during output into file?

@Purfview
Copy link
Owner

Purfview commented Jul 17, 2024

Can you share the json file produced with --output_format json?

@nebehr
Copy link
Author

nebehr commented Jul 17, 2024

101.json

This time it crashed AFTER producing the output and "Operation finished in:" ... line. Apparently this last crash is a case of SYSTRAN/faster-whisper#71 or something similar, but seems to be unrelated to this issue.

@Purfview
Copy link
Owner

Can you share the message of this new crash?

@nebehr
Copy link
Author

nebehr commented Jul 17, 2024

There is no message in the console, it's just a standard Windows popup saying that "program has stopped working". For each of these new crashes Windows Event Viewer contains pairs of error messages like these:

Faulting application name: faster-whisper-xxl.exe, version: 192.3.4.0, time stamp: 0x6626da66
Faulting module name: KERNELBASE.dll, version: 10.0.17763.6054, time stamp: 0xc9a93043
Exception code: 0xe06d7363
Fault offset: 0x0000000000041b39

Faulting application name: faster-whisper-xxl.exe, version: 192.3.4.0, time stamp: 0x6626da66
Faulting module name: ucrtbase.dll, version: 10.0.17763.1490, time stamp: 0x48ac8393
Exception code: 0xc0000409
Fault offset: 0x000000000006e77e

Note that, by the time it happens everything is already done and the program is exiting, and at no point it maxes out on memory. For this reason this new crash is not so bad, just inconvenient.

@Purfview
Copy link
Owner

Purfview commented Jul 17, 2024

IndexError: list index out of range

Can reproduce it with faster-whisper-xxl.exe 101.json command, I'll investigate it later.

This time it crashed AFTER producing the output and "Operation finished in:" ... line. Apparently this last crash is a case of SYSTRAN/faster-whisper#71 or something similar, but seems to be unrelated to this issue.

There is "beep" sound code after "Operation finished in:" ... line.
Could you try --beep_off? Do you get this crash only on this file or on all files?

@nebehr
Copy link
Author

nebehr commented Jul 17, 2024

By default this second crash comes after the beep. With --beep_off it just happens in silence. The crash is reproducible with many other files, and with larger models. I have not found the pattern yet. I am running it with CUDA 12.5, not sure if it is related.

@Purfview Purfview changed the title Crash before subtitle generation Crash in subtitle generation - IndexError: list index out of range Jul 17, 2024
@qscwdv65
Copy link

qscwdv65 commented Aug 1, 2024

I encountered a similar error message on Ubuntu 22.04 using Faster-Whisper-XXL_r192.3.1_linux.
This is the command i use and the output:


mis@ai-ai:~/下載/Faster-Whisper-XXL_r192.3.1_linux/Whisper-Faster-XXL$ sudo ./whisper-faster-xxl "2024-08-01 09-32-20.mkv" --language Chinese --initial_prompt "這是一段主要是繁體中文(台灣)的影片:" --model large-v2
[sudo] mis 的密碼:

Standalone Faster-Whisper-XXL r192.3.1 running on: CUDA

Starting work on: 2024-08-01 09-32-20.mkv

[00:00.520 --> 00:02.800] 但是其實呢
[00:03.560 --> 00:04.520] 然後呢
(skip......)
[01:32:05.960 --> 01:32:06.540] 好
[01:32:06.540 --> 01:32:06.940] 拜拜

Transcription speed: 36.67 audio seconds/s

Traceback (most recent call last):
File "main.py", line 1633, in
File "main.py", line 1617, in cli
File "main.py", line 310, in call
File "main.py", line 849, in write_result
File "main.py", line 792, in iterate_result_alt
File "main.py", line 775, in iterate_subtitles_alt
IndexError: list index out of range
[34535] Failed to execute script 'main' due to unhandled exception!


Additional information
I was able to successfully generate an SRT file without errors using the same command but with a different, shorter (2-minute) MP4 file.

@ClaireCJS
Copy link

ClaireCJS commented Oct 31, 2024

I've been randomly getting these too.

I think one was reproduceable, but a power failure made me lose track of it.

I'll keep my eye out

Pasted post from an another thread:

I'm wondering why I get these errors when I run whisper-faster-xxl.exe

Particularly since I don't have a ``d:\whisper-fast_XXL``` folder

They happen... for certain songs (1 out of 10-15), but not for others.

I can't say the exact cause, that i also can't fathom why it would be referencing a folder that doesn't exist on my D: drive ...

Transcription speed: 6.66 audio seconds/s

Traceback (most recent call last):
  File "D:\whisper-fast\_XXL\__main__.py", line 1668, in <module>
  File "D:\whisper-fast\_XXL\__main__.py", line 1652, in cli
  File "D:\whisper-fast\_XXL\__main__.py", line 320, in __call__
  File "D:\whisper-fast\_XXL\__main__.py", line 859, in write_result
  File "D:\whisper-fast\_XXL\__main__.py", line 802, in iterate_result_alt
  File "D:\whisper-fast\_XXL\__main__.py", line 785, in iterate_subtitles_alt
IndexError: list index out of range
[17684] Failed to execute script '__main__' due to unhandled exception!

@Purfview
Copy link
Owner

Purfview commented Nov 3, 2024

Particularly since I don't have a D:\whisper-fast\_XXL\__main__.py folder

@ClaireCJS Those are internal paths inside exe, not on your PC.

@ClaireCJS
Copy link

ClaireCJS commented Nov 3, 2024

Particularly since I don't have a D:\whisper-fast\_XXL\__main__.py folder

@ClaireCJS Those are internal paths inside exe, not on your PC.

I know. It's just weird. I don't even have whisper on my D: ... I understand it's not real, it's just... weird. It's failing and knowing why would be nice? Sorry 😅

@Purfview
Copy link
Owner

Purfview commented Nov 6, 2024

Fixed in v193.1

@Purfview Purfview closed this as completed Nov 6, 2024
@nebehr
Copy link
Author

nebehr commented Nov 9, 2024

Unfortunately, it is still reproducible in v193.1, albeit with a slightly different stacktrace, but the error appears to be the same.

  File "D:\whisper-fast\_XXL\__main__.py", line 1765, in <module>
  File "D:\whisper-fast\_XXL\__main__.py", line 1732, in cli
  File "D:\whisper-fast\_XXL\__main__.py", line 750, in write_all
  File "D:\whisper-fast\_XXL\__main__.py", line 365, in __call__
  File "D:\whisper-fast\_XXL\__main__.py", line 689, in write_result
  File "D:\whisper-fast\_XXL\__main__.py", line 529, in iterate_result
IndexError: string index out of range
[4460] Failed to execute script '__main__' due to unhandled exception!

This is on attempt to use --output_format all, apparently it failed half-way through the vtt (otherwise it fails at the same point in srt). The media file is rather big though and takes long to process, which isn't conducive to more detailed investigation. I will see if I can get more details.

@Purfview
Copy link
Owner

Purfview commented Nov 9, 2024

Can you share json file?

@nebehr
Copy link
Author

nebehr commented Nov 9, 2024

I was actually hoping to do that by asking for all formats, to save time on transcription, but apparently the "bad" one comes earlier in the queue. In what sequence are they processed with --output_format all?

@Purfview
Copy link
Owner

Purfview commented Nov 9, 2024

I think json is the last, I'll put it as first in the next release.

@Purfview
Copy link
Owner

Purfview commented Nov 9, 2024

Unfortunately, it is still reproducible in v193.1

It's not, because it's not the same bug.
Try faster-whisper-xxl.exe 101.json -f all

@nebehr
Copy link
Author

nebehr commented Nov 9, 2024

Indeed, this may be related to the length of produced chunks. The model I am using does not split the text into sentences properly for some reason, therefore I am using --max_line_width with some other parameters. So, conversion from JSON to SRT fails with values of --max_line_width up to 128 (I wonder if the boundary being a power of 2 plays a factor here), but passes without it or with higher ones. The chunk where it fails (at [25:36.630 --> 26:04.010]) does appear to be the longest of the lot.

x.zip

Do you want me to create a separate issue for this?

@Purfview
Copy link
Owner

Purfview commented Nov 9, 2024

Share your command.

Do you want me to create a separate issue for this?

Nah.

@nebehr
Copy link
Author

nebehr commented Nov 9, 2024

The one to reproduce with the attached JSON file is faster-whisper-xxl.exe x.json --max_line_width 35 -f srt.

The one where I encountered it originally in this release is faster-whisper-xxl.exe --model <CUSTOM_MODEL> -l is --max_line_width 35 --max_line_count 2 --sentence --max_comma_cent 50 <FILE_NAME>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants