Mistral-7B-inference-optimisation- Currently, throughput of ~300 tokens/sec is achieved using a batch size of 34. Throughput drops to 30 tokens/sec for single input. Next - To use INT8 or fp8 quantised models or use TensorRT