Dynamic batching. #194

ArmykOliva · 2024-12-24T00:07:42Z

This implementation does not use dynamic on the fly batching. It is essentially putting requests in a queue and handling them one by one, which is very inefficient for actual production use case. Is there someone working on implementing dynamic batching? Or did someone successfuly run whisper on longer than 30s audio on Triton? I had a problem with triton and tensorrtlllm that it was cutting off my outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic batching. #194

Dynamic batching. #194

ArmykOliva commented Dec 24, 2024

Dynamic batching. #194

Dynamic batching. #194

Comments

ArmykOliva commented Dec 24, 2024