Replies: 3 comments 3 replies
-
Hi, never seen this (and I'm on Linux) but this sounds like a localization issue. You can check the encoding with a hex editor, the accented characters start with 0xc3. |
Beta Was this translation helpful? Give feedback.
-
It seems like a problem with either your powershell OR what you use to read the file (the editor or whatever you use has to support utf8 as well). Also you can you try to specify an output folder (option |
Beta Was this translation helpful? Give feedback.
-
This was also opened in an issue #170 ... |
Beta Was this translation helpful? Give feedback.
-
Hey! I am using whisper_timestamped to create json files from wav files for a project at university, and when I open the json file, the accented characters come up as interrogation symbols. I have checked if it was my computer's powershell and it has UTF-8 (so it is ok) and when I tried to save a .txt the same thing happened. I have also downloaded one of your examples up on the documentation the smartphone audio, and the same thing happens. The command line that I am using to create these json files from wav is : whisper_timestamped file.wav --model tiny --accurate > outputfile.json . I have tried with many different models, with and without accurate and the same thing occurs. it is simply for words with accents or characters like : ç . How can I solve this issue?? I have been trying to make it work for so long now and I have no more ideas!
Beta Was this translation helpful? Give feedback.
All reactions