Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use this model with whisper.cpp #6

Closed
stopthinking102 opened this issue Aug 14, 2024 · 1 comment
Closed

how to use this model with whisper.cpp #6

stopthinking102 opened this issue Aug 14, 2024 · 1 comment

Comments

@stopthinking102
Copy link

can u share which parameter to set to use this parameter in whisper.cpp.
is it the n_audio_ctx paramter with 1500 referring to 1.5 seconds. Would have been great if there was an android sample.

// medium
// hparams: {
// 'n_mels': 80,
// 'n_vocab': 51864,
// 'n_audio_ctx': 1500,
// 'n_audio_state': 1024,
// 'n_audio_head': 16,
// 'n_audio_layer': 24,
// 'n_text_ctx': 448,
// 'n_text_state': 1024,
// 'n_text_head': 16,
// 'n_text_layer': 24
// }
//
// default hparams (Whisper tiny)
struct whisper_hparams {
int32_t n_vocab = 51864;
int32_t n_audio_ctx = 1500;
int32_t n_audio_state = 384;

@abb128
Copy link
Collaborator

abb128 commented Aug 14, 2024

You can run whisper.cpp main and set -ac to a number between 1 and 1500. It's not in milliseconds, rather it's 50 per second. So if you have a 30 second clip (maximum), it's 1500. If you have a 5 second clip, it's 5 * 50 = 250. It also usually helps to add a constant of around 64, so you'd do 5 * 50 + 64 = 314.

Programmatically you can set audio_context field of whisper_full_params

@abb128 abb128 closed this as completed Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants