Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF Characters in Movie title. #124

Open
brandonganem opened this issue Oct 26, 2016 · 6 comments
Open

UTF Characters in Movie title. #124

brandonganem opened this issue Oct 26, 2016 · 6 comments
Assignees
Milestone

Comments

@brandonganem
Copy link

$ python2.7 autorippr.py --all --debug
2016-10-26 21:20:09 - Rip - DEBUG - Ripping initialised
2016-10-26 21:20:09 - Rip - DEBUG - Checking for DVDs
2016-10-26 21:20:16 - Rip - DEBUG - 1 DVD(s) found
2016-10-26 21:20:16 - Makemkv - DEBUG - Detected movie Les Miserables Dom
2016-10-26 21:20:41 - Makemkv - DEBUG - MakeMKV found 1 titles
2016-10-26 21:20:41 - Makemkv - DEBUG - MakeMKV title info: Disc Title: ['Les Mis\xc3\xa9rables'], Title No.: 0, Title: ['Les_Mis\xc3\xa9rables_t00.mkv'],
2016-10-26 21:20:41 - Rip - DEBUG - Attempting to rip Les_Misérables_t00.mkv from Les Miserables Dom
2016-10-26 21:50:48 - Rip - INFO - It took 30 minute(s) to complete the ripping of Les_Misérables_t00.mkv from Les Miserables Dom
2016-10-26 21:50:48 - Eject - DEBUG - Ejecting drive: "/dev/sr0"
2016-10-26 21:50:48 - Eject - DEBUG - Attempting OS detection
2016-10-26 21:50:48 - Eject - DEBUG - OS detected as Unix
2016-10-26 21:50:52 - Eject - DEBUG - eject: device name is `/dev/sr0'
2016-10-26 21:50:52 - Eject - DEBUG - eject: /dev/sr0: not mounted
2016-10-26 21:50:52 - Eject - DEBUG - eject: /dev/sr0: is whole-disk device
2016-10-26 21:50:52 - Eject - DEBUG - eject: /dev/sr0: trying to eject using CD-ROM eject command
2016-10-26 21:50:52 - Eject - DEBUG - eject: CD-ROM eject command succeeded
2016-10-26 21:50:52 - Compress - DEBUG - Compressing initialised
2016-10-26 21:50:52 - Compress - DEBUG - Looking for videos to compress
Traceback (most recent call last):
File "autorippr.py", line 419, in
compress(config)
File "autorippr.py", line 272, in compress
dbvideo.filename, dbvideo.vidname))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

@brandonganem
Copy link
Author

les_miserables.txt

@JasonMillward
Copy link
Owner

Thanks for the detailed logs, I'll see what I can do about it this weekend

@srounet
Copy link
Contributor

srounet commented Oct 31, 2016

I think my Pull-Request address this problems, had the exact same issue with: "Master_and_Commander_De_l'autre_côté_du_monde_t00.mkv"

This commit address the issue by mapping accentuated characters, and removing 'some' special characters like quote or double quotes.

@knoer
Copy link

knoer commented Nov 4, 2016

I think I hit something similar yesterday, when I got around to install and try Autorippr..

In Danish (and Norwegian), we use some special characters; Æ/æ, Ø/ø and Å/å
-similarly, in Swedish, these are Ä/ä, Ö/ö and Å/å
Normally, if using a non-Nordic keyboard, would be substituted with AE/ae, OE/oe and AA/aa.

As part of my test yesterday, I tried to rip and compress a DVD of "dinner for one", which in Danish is called "90 års fødselsdagen" ("the 90 year birthday") and hence, the DVD title is "90ÅRS"

The ripping part of the supposedly worked as expected, but immediately after makeMKV finished, I got an error.
Unfortunately, I currently do not have access to my log files, but I recall a line similar to brandonganem's

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

only the character in question was (again, as I recall) 0'\x3c'

Fortunately, this error is very reproducible - which I will, and get back with further details/logs..
(I realise, that in order for this file to be recognized by any scraper tool, I may need to rename this to the english title - and I can work around this by manually sending this file through Handbrake)

Question is, is this something that is worthwile to implement a fix for?
And if so, could/should this be done by one of the following approaches?

  • implement substitution of these characters in a function in classes/utils.py
  • implement support for 2-byte UTF-8 characters in general

@JasonMillward
Copy link
Owner

@knoer if you check out this pull request #125 you can see that @srounet has addressed the issue.

If you check out a copy of the master branch you should get these changes and they might solve your problem too.

@knoer
Copy link

knoer commented Nov 4, 2016

I checked out the repo just last night, and looking through autorippr.py, I am sure I recall the comment about the Master and Commander string conversion, so I should be running the latest code..?

If I recall, the Nordic special characters are part of iso-8859-1 - my system may very well be running this charset as default for the same reason - I will investigate if this is true during the weekend.

I find it hard to estimate the value of spending time on handling a singular case of a character conversion gone wrong if there are no one else having the same issues. -so I'll let you be the judge of that.

In any case, I guess I will try experimenting a a bit with charset encoding before attempting to do a string cleanup using the functions in util.py..
A couple of years ago I tried getting into Python, this might be a good time to pick it up again.. :-)

As a general solution, could it be possible to detect the system charset and decode strings from this format to UTF-8 as a part of the string cleanup process - maybe selectable by a parameter in settings.cfg?

This minor issue aside, I still find Autorippr an awesome tool for backing up media (and avoiding the kids (man)handling disks) at home!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants