Mistral-7B-inference-optimisation-

Currently, throughput of ~300 tokens/sec is achieved using a batch size of 34. Throughput drops to 30 tokens/sec for single input.

Next - To use INT8 or fp8 quantised models or use TensorRT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Final.ipynb		Final.ipynb
README.md		README.md

Provide feedback