You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This implementation does not use dynamic on the fly batching. It is essentially putting requests in a queue and handling them one by one, which is very inefficient for actual production use case. Is there someone working on implementing dynamic batching? Or did someone successfuly run whisper on longer than 30s audio on Triton? I had a problem with triton and tensorrtlllm that it was cutting off my outputs.
The text was updated successfully, but these errors were encountered:
This implementation does not use dynamic on the fly batching. It is essentially putting requests in a queue and handling them one by one, which is very inefficient for actual production use case. Is there someone working on implementing dynamic batching? Or did someone successfuly run whisper on longer than 30s audio on Triton? I had a problem with triton and tensorrtlllm that it was cutting off my outputs.
The text was updated successfully, but these errors were encountered: