Whisper-Base-En: Automatic speech recognition (ASR) model for English transcription as well as translation
OpenAI’s Whisper ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a mean decoded length specified below.
This is based on the implementation of Whisper-Base-En found here. This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found here.
Sign up to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device.
Install the package via pip:
pip install "qai_hub_models[whisper_base_en]"
Once installed, run the following simple CLI demo:
python -m qai_hub_models.models.whisper_base_en.demo
More details on the CLI tool can be found with the --help
option. See
demo.py for sample usage of the model including pre/post processing
scripts. Please refer to our general instructions on using
models for more usage instructions.
This repository contains export scripts that produce a model optimized for on-device deployment. This can be run as follows:
python -m qai_hub_models.models.whisper_base_en.export
Additional options are documented with the --help
option. Note that the above
script requires access to Deployment instructions for Qualcomm® AI Hub.
- The license for the original implementation of Whisper-Base-En can be found here.
- The license for the compiled assets for on-device deployment can be found here
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.