Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 231 Bytes

File metadata and controls

6 lines (4 loc) · 231 Bytes

Mistral-7B-inference-optimisation-

Currently, throughput of ~300 tokens/sec is achieved using a batch size of 34. Throughput drops to 30 tokens/sec for single input.

Next - To use INT8 or fp8 quantised models or use TensorRT