You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the model forward for encode, through something like a callback.
Is this possible, and if so, could someone point me to how to do it, or what it would take to add this feature? I am happy to contribute a PR if this is something useful to others.
The text was updated successfully, but these errors were encountered:
Otherwise, would it be possible to do something like a streaming encode, where I pass in more audio data as it comes, in like 10ms chunks, to reduce the latency? Or is this not possible due to the transformer architecture?
Would it be possible to do speculative decoding with distil-whisper-v3 as the large model, and some other tiny one as the small model to reduce latency?
I am calling encode from whisperX/faster-whisper.
Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the model forward for encode, through something like a callback.
Is this possible, and if so, could someone point me to how to do it, or what it would take to add this feature? I am happy to contribute a PR if this is something useful to others.
The text was updated successfully, but these errors were encountered: