How to early stop an encoding call? #1768

mariano54 · 2024-09-01T18:41:41Z

I am calling encode from whisperX/faster-whisper.

Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the model forward for encode, through something like a callback.

Is this possible, and if so, could someone point me to how to do it, or what it would take to add this feature? I am happy to contribute a PR if this is something useful to others.

mariano54 · 2024-09-02T04:42:36Z

Otherwise, would it be possible to do something like a streaming encode, where I pass in more audio data as it comes, in like 10ms chunks, to reduce the latency? Or is this not possible due to the transformer architecture?

Would it be possible to do speculative decoding with distil-whisper-v3 as the large model, and some other tiny one as the small model to reduce latency?

minhthuc2502 · 2024-09-03T09:06:58Z

Hello, I don't have any suggestion for stopping early the current call. we don't have any way to access the forward of the calling encoder.

In the ctranslate2, it have not yet support the continuous batching. There is a discussion here #1333

mariano54 · 2024-09-09T17:36:10Z

Ok, thank you!

mariano54 closed this as completed Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to early stop an encoding call? #1768

How to early stop an encoding call? #1768

mariano54 commented Sep 1, 2024 •

edited

Loading

mariano54 commented Sep 2, 2024 •

edited

Loading

minhthuc2502 commented Sep 3, 2024

mariano54 commented Sep 9, 2024

How to early stop an encoding call? #1768

How to early stop an encoding call? #1768

Comments

mariano54 commented Sep 1, 2024 • edited Loading

mariano54 commented Sep 2, 2024 • edited Loading

minhthuc2502 commented Sep 3, 2024

mariano54 commented Sep 9, 2024

mariano54 commented Sep 1, 2024 •

edited

Loading

mariano54 commented Sep 2, 2024 •

edited

Loading