Mistral-7B-inference-optimisation-

Currently, throughput of ~300 tokens/sec is achieved using a batch size of 34. Throughput drops to 30 tokens/sec for single input.

Next - To use INT8 or fp8 quantised models or use TensorRT