Skip to content

Satvik713/Mistral-7B-inference-optimisation-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Mistral-7B-inference-optimisation-

Currently, throughput of ~300 tokens/sec is achieved using a batch size of 34. Throughput drops to 30 tokens/sec for single input.

Next - To use INT8 or fp8 quantised models or use TensorRT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published