Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC extraction error: 'utf-8' codec can't decode byte 0xe9 in position 35: invalid continuation byte. The letter "é" breaks ccextraction #172

Open
backcountrymountains opened this issue Dec 22, 2024 · 0 comments

Comments

@backcountrymountains
Copy link

I've been using comskip to remove commercials from OTA-recorded NFL football (American) because I don't have the attention span to keep watching the games with so many commercial breaks. Comskip works great!

I've found that looking at the type of closed captioning in the file is a great way to find commercials because the in-game announcer captions are all POPUP whereas commercials are ROLLON.

Anyway, I will rarely get an error during CC extraction (comskip being called from comcut):

11-18-2024 12:06:37 AM - DEBUG   : comskip/comcut output: 322) S: 19886 E: 19914 L:  38 and auto with the Personal Price Plan,
11-18-2024 12:06:37 AM - DEBUG   : comskip/comcut output: 323) S: 19923 E: 19956 L:  44 they decided to bundle practice with ballet.
11-18-2024 12:06:37 AM - ERROR   : error processing file: NFL Football 2024_11_17_14_25_00 - Kansas City Chiefs at Buffalo Bills.ts
11-18-2024 12:06:37 AM - ERROR   : error processing file: 'utf-8' codec can't decode byte 0xe9 in position 35: invalid continuation byte
11-18-2024 12:06:37 AM - DEBUG   : restore complete

Another:

08-08-2024 09:23:51 PM - DEBUG   : comskip/comcut output: 490) S: 33083 E: 33134 L:  19  Every silky broth,
08-08-2024 09:23:51 PM - ERROR   : error processing file: Primetime in Paris: The Olympics 2024_08_06_19_00_00 - Track and Field, Diving: Primetime in Paris.ts
08-08-2024 09:23:51 PM - ERROR   : error processing file: 'utf-8' codec can't decode byte 0xe9 in position 51: invalid continuation byte
08-08-2024 09:23:51 PM - DEBUG   : restore complete
08-08-2024 11:16:11 PM - INFO    : processing file: /library/Primetime in Paris: The Olympics/8_6.mkv
08-08-2024 11:16:11 PM - DEBUG   : backup complete

From googling this error, it seems like this issue is related to utf-8 not having a character matching 0xe9. Many of the stackoverflow comments seemed to indicate that reading the text as latin-1, which can decode 0xe9, instead of ```utf-8`` might get past this error.

According to this decoder website 0xe9 in latin-1 is é.

I don't know how comskip works at all, but if there's a place in the code to change utf-8 decoding to latin-1 decoding that would be really helpful, I think.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant