Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enabling torch.compile for quantized model for speedups
Summary: att Next: * we can follow up on memory usage as well Test Plan: follow instructions in https://github.com/pytorch/torchtune/tree/main/recipes#architecture-optimization to quantize the model and modify the generate.yaml file so it can run the int4 weight only quantized model, and run: ``` tune run generate --config generate ``` ``` 2024-04-08:17:06:31,706 INFO [generate.py:68] Model is initialized with precision torch.bfloat16. 2024-04-08:17:07:33,793 INFO [generate.py:113] Hello, my name is Elizabeth. Introverts don't talk much unless we know you really well, which makes it difficult to get to know us. I am a nerd and a geek, and I like to read. My favorite books are fantasy and sci-fi. I like to write. My favorite genre is romance, (though I hope for this blog to be more than one genre). I have been writing for as long as I can remember. I have a lot to write about but not enough time to do it. I want to write a book one day and someday I will. I work at my family's restaurant and I take care of my grandmother. She is my whole world. I like animals and I am a vegetarian. I am afraid of everything as well as being a little weird because of my diagnosis of autism and anxiety disorder. Music is my life and I can't live without it. I play the piano and drums and I write my own music. My dreams are far and away. I hope this is something you enjoy watching unfold. Sup, sup, sup, sup! You're very cool. "I am afraid of everything as well as being a little weird because of my diagnosis of autism and anxiety disorder." I am so glad you found me. I am an introvert as well and also a Nerd 2024-04-08:17:07:33,794 INFO [generate.py:117] Time for inference: 61.78 sec total, 4.86 tokens/sec 2024-04-08:17:07:33,794 INFO [generate.py:120] Memory used: 17.85 GB ``` Reviewers: Subscribers: Tasks: Tags:
- Loading branch information