You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all. I am trying to use large-v3-32-2-conditioned-prompt-logic-timestamped to transcribe audio with chinese language. However, it returns English translation of origin Chinese content.
Here is the code:
from whisper_jax import FlaxWhisperPipline
pipeline = FlaxWhisperPipline("sanchit-gandhi/large-v3-32-2-conditioned-prompt-logic-timestamped")
outputs = pipeline("R1.wav", task="transcribe", return_timestamps=True, language="chinese")
print(outputs)
Here is the result with task="translate" and language="chinese":
{'text': " Hello, 130 is at your service. Hello. Hello. The parking lot at the entrance is full of black Mercedes Benz cars. All cars are waiting to go to work. Oh, the police station is not there yet, right? Not yet. It's been more than half an hour. OK, we'll hurry up. Wait a minute. OK. We will rush it. Wait a moment.", 'chunks': [{'timestamp': (0.0, 5.6), 'text': ' Hello, police station 130 is at your service.'}, {'timestamp': (5.6, 6.4), 'text': ' Hello.'}, {'timestamp': (6.4, 7.8), 'text': ' Hello.'}, {'timestamp': (7.8, 12.4), 'text': ' The parking lot at the entrance is full of black Mercedes Benz cars.'}, {'timestamp': (12.4, 14.8), 'text': ' All cars are waiting to go to work.'}, {'timestamp': (14.8, 16.6), 'text': ' Oh, the police station is not there yet, right?'}, {'timestamp': (16.6, 19.6), 'text': " Not yet. It's been more than half an hour."}, {'timestamp': (19.6, 21.0), 'text': " OK, we'll hurry up."}, {'timestamp': (21.0, 21.6), 'text': ' Wait a minute.'}, {'timestamp': (21.6, 22.6), 'text': ' OK.'}, {'timestamp': (19.87, 21.87), 'text': ' We will rush it. Wait a moment.'}]}
Here is the result with task="transcribe" and language="chinese":
{'text': " Hello, 1.30 For you for you. Hello. Hey, you know. This is the food's at the way a lot of the car businging's all the car all the car are still not not. No, it's It's just half a hour hours. Okay, we're we're okay, we're okay, we're let's let's let's wait. Thank you.", 'chunks': [{'timestamp': (0.0, 0.84), 'text': ' Hello,'}, {'timestamp': (0.84, 3.72), 'text': ' 1.30'}, {'timestamp': (3.72, 5.8), 'text': ' For you for you.'}, {'timestamp': (5.8, 6.64), 'text': ' Hello.'}, {'timestamp': (6.64, 8.36), 'text': ' Hey, you know.'}, {'timestamp': (8.36, 9.52), 'text': ' This is the'}, {'timestamp': (9.52, 10.68), 'text': " food's at the way"}, {'timestamp': (10.68, 11.72), 'text': ' a lot of the car'}, {'timestamp': (11.72, 12.8), 'text': " businging's"}, {'timestamp': (12.8, 13.72), 'text': ' all the car'}, {'timestamp': (13.72, 14.84), 'text': ' all the car'}, {'timestamp': (14.84, 15.92), 'text': ' are still'}, {'timestamp': (15.92, 15.96), 'text': ' not'}, {'timestamp': (15.96, 16.88), 'text': ' not.'}, {'timestamp': (16.88, 17.8), 'text': ' No,'}, {'timestamp': (17.8, 18.68), 'text': " it's"}, {'timestamp': (18.68, 18.96), 'text': " It's just half"}, {'timestamp': (18.96, 19.12), 'text': ' a'}, {'timestamp': (19.12, 19.16), 'text': ' hour'}, {'timestamp': (19.16, 19.72), 'text': ' hours.'}, {'timestamp': (19.72, 19.92), 'text': ' Okay,'}, {'timestamp': (19.92, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': " we're okay,"}, {'timestamp': (20.72, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': ' okay,'}, {'timestamp': (20.72, 22.36), 'text': " we're"}, {'timestamp': (22.36, 21.72), 'text': " let's"}, {'timestamp': (21.72, 22.92), 'text': " let's"}, {'timestamp': (22.92, None), 'text': " let's wait. Thank you."}]}
Here is the result returned by openai-large-v2:
{'text': '您好,话务员为您服务。你好。喂,你好。这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。这边所有 车辆都等着上班啊。哦,人还没到是吧?没到,已经都半个多小时了。行行,我们催一下,稍等,马上就到了。好,那我们先催一下,稍等,马上就要了。', 'chunks': [{'timestamp': (0.0, 6.0), 'text': '您好,话务员为您服务。'}, {'timestamp': (6.0, 7.0), 'text': '你好。'}, {'timestamp': (7.0, 8.0), 'text': '喂,你好。'}, {'timestamp': (8.0, 12.0), 'text': '这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。'}, {'timestamp': (12.0, 15.0), 'text': '这边所有车辆都等着上班啊。'}, {'timestamp': (15.0, 17.0), 'text': '哦,人还没到是吧?'}, {'timestamp': (17.0, 20.0), 'text': '没到,已经都半个多小时了 。'}, {'timestamp': (20.0, 23.0), 'text': '行行,我们催一下,稍等,马上就到了。'}, {'timestamp': (19.87, 21.87), 'text': '好,那我们先催一下,稍等,马上就要了。'}]}
Thanks for any advice.
The text was updated successfully, but these errors were encountered:
Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training
Hey @EarlWilliam - this model is part of the Distil-Whisper series, and is thus trained on English speech only. This likely explains why it only transcribes in English. If you're interested in training a Distil-Whisper model in Chinese, refer to the training guide: https://github.com/huggingface/distil-whisper/tree/main/training
Hi all. I am trying to use large-v3-32-2-conditioned-prompt-logic-timestamped to transcribe audio with chinese language. However, it returns English translation of origin Chinese content.
Here is the code:
from whisper_jax import FlaxWhisperPipline
pipeline = FlaxWhisperPipline("sanchit-gandhi/large-v3-32-2-conditioned-prompt-logic-timestamped")
outputs = pipeline("R1.wav", task="transcribe", return_timestamps=True, language="chinese")
print(outputs)
Here is the result with task="translate" and language="chinese":
{'text': " Hello, 130 is at your service. Hello. Hello. The parking lot at the entrance is full of black Mercedes Benz cars. All cars are waiting to go to work. Oh, the police station is not there yet, right? Not yet. It's been more than half an hour. OK, we'll hurry up. Wait a minute. OK. We will rush it. Wait a moment.", 'chunks': [{'timestamp': (0.0, 5.6), 'text': ' Hello, police station 130 is at your service.'}, {'timestamp': (5.6, 6.4), 'text': ' Hello.'}, {'timestamp': (6.4, 7.8), 'text': ' Hello.'}, {'timestamp': (7.8, 12.4), 'text': ' The parking lot at the entrance is full of black Mercedes Benz cars.'}, {'timestamp': (12.4, 14.8), 'text': ' All cars are waiting to go to work.'}, {'timestamp': (14.8, 16.6), 'text': ' Oh, the police station is not there yet, right?'}, {'timestamp': (16.6, 19.6), 'text': " Not yet. It's been more than half an hour."}, {'timestamp': (19.6, 21.0), 'text': " OK, we'll hurry up."}, {'timestamp': (21.0, 21.6), 'text': ' Wait a minute.'}, {'timestamp': (21.6, 22.6), 'text': ' OK.'}, {'timestamp': (19.87, 21.87), 'text': ' We will rush it. Wait a moment.'}]}
Here is the result with task="transcribe" and language="chinese":
{'text': " Hello, 1.30 For you for you. Hello. Hey, you know. This is the food's at the way a lot of the car businging's all the car all the car are still not not. No, it's It's just half a hour hours. Okay, we're we're okay, we're okay, we're let's let's let's wait. Thank you.", 'chunks': [{'timestamp': (0.0, 0.84), 'text': ' Hello,'}, {'timestamp': (0.84, 3.72), 'text': ' 1.30'}, {'timestamp': (3.72, 5.8), 'text': ' For you for you.'}, {'timestamp': (5.8, 6.64), 'text': ' Hello.'}, {'timestamp': (6.64, 8.36), 'text': ' Hey, you know.'}, {'timestamp': (8.36, 9.52), 'text': ' This is the'}, {'timestamp': (9.52, 10.68), 'text': " food's at the way"}, {'timestamp': (10.68, 11.72), 'text': ' a lot of the car'}, {'timestamp': (11.72, 12.8), 'text': " businging's"}, {'timestamp': (12.8, 13.72), 'text': ' all the car'}, {'timestamp': (13.72, 14.84), 'text': ' all the car'}, {'timestamp': (14.84, 15.92), 'text': ' are still'}, {'timestamp': (15.92, 15.96), 'text': ' not'}, {'timestamp': (15.96, 16.88), 'text': ' not.'}, {'timestamp': (16.88, 17.8), 'text': ' No,'}, {'timestamp': (17.8, 18.68), 'text': " it's"}, {'timestamp': (18.68, 18.96), 'text': " It's just half"}, {'timestamp': (18.96, 19.12), 'text': ' a'}, {'timestamp': (19.12, 19.16), 'text': ' hour'}, {'timestamp': (19.16, 19.72), 'text': ' hours.'}, {'timestamp': (19.72, 19.92), 'text': ' Okay,'}, {'timestamp': (19.92, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': " we're okay,"}, {'timestamp': (20.72, 20.36), 'text': " we're"}, {'timestamp': (20.36, 20.72), 'text': ' okay,'}, {'timestamp': (20.72, 22.36), 'text': " we're"}, {'timestamp': (22.36, 21.72), 'text': " let's"}, {'timestamp': (21.72, 22.92), 'text': " let's"}, {'timestamp': (22.92, None), 'text': " let's wait. Thank you."}]}
Here is the result returned by openai-large-v2:
{'text': '您好,话务员为您服务。你好。喂,你好。这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。这边所有 车辆都等着上班啊。哦,人还没到是吧?没到,已经都半个多小时了。行行,我们催一下,稍等,马上就到了。好,那我们先催一下,稍等,马上就要了。', 'chunks': [{'timestamp': (0.0, 6.0), 'text': '您好,话务员为您服务。'}, {'timestamp': (6.0, 7.0), 'text': '你好。'}, {'timestamp': (7.0, 8.0), 'text': '喂,你好。'}, {'timestamp': (8.0, 12.0), 'text': '这边停车场这个出入口的位置啊,一辆黑色的奔驰车把这路堵了。'}, {'timestamp': (12.0, 15.0), 'text': '这边所有车辆都等着上班啊。'}, {'timestamp': (15.0, 17.0), 'text': '哦,人还没到是吧?'}, {'timestamp': (17.0, 20.0), 'text': '没到,已经都半个多小时了 。'}, {'timestamp': (20.0, 23.0), 'text': '行行,我们催一下,稍等,马上就到了。'}, {'timestamp': (19.87, 21.87), 'text': '好,那我们先催一下,稍等,马上就要了。'}]}
Thanks for any advice.
The text was updated successfully, but these errors were encountered: