-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ Support #421
GPTQ Support #421
Conversation
… to all logging messages (vllm-project#9590)
Signed-off-by: luka <[email protected]>
…der + fix quant args (vllm-project#9217) Co-authored-by: Isotr0py <[email protected]>
…vllm-project#9612) Signed-off-by: Alex-Brooks <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
…#9620) Signed-off-by: youkaichao <[email protected]>
) Signed-off-by: Tyler Michael Smith <[email protected]>
…#9393) Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
…ti-image (vllm-project#9626) Signed-off-by: mgoin <[email protected]>
…t#9628) Signed-off-by: mgoin <[email protected]>
Signed-off-by: Vinay Damodaran <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
…-project#9639) Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
…-project#9637) Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]>
…workflow (vllm-project#9661) Signed-off-by: Harry Mellor <[email protected]>
…-project#6143) Signed-off-by: yuwenzho <[email protected]> Signed-off-by: Chendi.Xue <[email protected]> Signed-off-by: Bob Zhu <[email protected]> Signed-off-by: zehao-intel <[email protected]> Signed-off-by: Konrad Zawora <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: Sanju C Sudhakaran <[email protected]> Co-authored-by: Michal Adamczyk <[email protected]> Co-authored-by: Marceli Fylcek <[email protected]> Co-authored-by: Himangshu Lahkar <[email protected]> Co-authored-by: Vivek Goel <[email protected]> Co-authored-by: yuwenzho <[email protected]> Co-authored-by: Dominika Olszewska <[email protected]> Co-authored-by: barak goldberg <[email protected]> Co-authored-by: Michal Szutenberg <[email protected]> Co-authored-by: Jan Kaniecki <[email protected]> Co-authored-by: Agata Dobrzyniewicz <[email protected]> Co-authored-by: Krzysztof Wisniewski <[email protected]> Co-authored-by: Dudi Lester <[email protected]> Co-authored-by: Ilia Taraban <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Co-authored-by: Michał Kuligowski <[email protected]> Co-authored-by: Jakub Maksymczuk <[email protected]> Co-authored-by: Tomasz Zielinski <[email protected]> Co-authored-by: Sun Choi <[email protected]> Co-authored-by: Iryna Boiko <[email protected]> Co-authored-by: Bob Zhu <[email protected]> Co-authored-by: hlin99 <[email protected]> Co-authored-by: Zehao Huang <[email protected]> Co-authored-by: Andrzej Kotłowski <[email protected]> Co-authored-by: Yan Tomsinsky <[email protected]> Co-authored-by: Nir David <[email protected]> Co-authored-by: Yu-Zhou <[email protected]> Co-authored-by: Ruheena Suhani Shaik <[email protected]> Co-authored-by: Karol Damaszke <[email protected]> Co-authored-by: Marcin Swiniarski <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Jacek Czaja <[email protected]> Co-authored-by: Jacek Czaja <[email protected]> Co-authored-by: Yuan <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Varad Ahirwadkar <[email protected]> Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Rafael Vasquez <[email protected]> Signed-off-by: Yuan Zhou <[email protected]> Signed-off-by: luka <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: charlifu <[email protected]> Signed-off-by: Sam Stoelinga <[email protected]> Signed-off-by: Vasily Alexeev <[email protected]> Signed-off-by: Kevin-Yang <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: Bill Nell <[email protected]> Signed-off-by: wangshuai09 <[email protected]> Signed-off-by: Qishuai [email protected] Signed-off-by: yuze.zyz <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Kunjan Patel <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: kevin <[email protected]> Signed-off-by: YiSheng5 <[email protected]> Signed-off-by: yan ma <[email protected]> Signed-off-by: Went-Liang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: sasha0552 <[email protected]> Signed-off-by: mzusman <[email protected]> Signed-off-by: Prashant Gupta <[email protected]> Signed-off-by: André Jonasson <[email protected]> Signed-off-by: Gene Su <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Peter Salas <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Michael Green <[email protected]> Signed-off-by: Shanshan Wang <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: daitran2k1 <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Robert Shaw <[email protected]> Signed-off-by: Hissu Hyvarinen <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: Tomer Asida <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: sasha0552 <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Daniele <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: bnellnm <[email protected]> Co-authored-by: Kai Wu <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Shashwat Srijan <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Andrew Feldman <[email protected]> Co-authored-by: afeldman-nm <[email protected]> Co-authored-by: laishzh <[email protected]> Co-authored-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Haoyu Wang <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: tomeras91 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kunjan <[email protected]> Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Chih-Chieh Yang <[email protected]> Co-authored-by: Yue Zhang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Andy Dai <[email protected]> Co-authored-by: Dhia Eddine Rhaiem <[email protected]> Co-authored-by: yudian0504 <[email protected]> Co-authored-by: Varad Ahirwadkar <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Baoyuan Qi <[email protected]> Co-authored-by: Wallas Henrique <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: ngrozae <[email protected]> Co-authored-by: Falko1 <[email protected]> Co-authored-by: Rafael Vasquez <[email protected]> Co-authored-by: chenqianfzh <[email protected]> Co-authored-by: wangshuai09 <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: xendo <[email protected]> Co-authored-by: Jerzy Zagorski <[email protected]> Co-authored-by: gopalsarda <[email protected]> Co-authored-by: Yuan <[email protected]> Co-authored-by: Gubrud, Aaron D <[email protected]> Co-authored-by: adgubrud <[email protected]> Co-authored-by: Yuhong Guo <[email protected]> Co-authored-by: Yuhong Guo <[email protected]> Co-authored-by: Ronen Schaffer <[email protected]> Co-authored-by: Aurick Qiao <[email protected]> Co-authored-by: Jeremy Arnold <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: yulei <[email protected]> Co-authored-by: Seth Kimmel <[email protected]> Co-authored-by: Kaunil Dhruv <[email protected]> Co-authored-by: Flex Wang <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: Yongzao <[email protected]> Co-authored-by: Yunfei Chu <[email protected]> Co-authored-by: Vinay R Damodaran <[email protected]> Co-authored-by: Yan Ma <[email protected]> Co-authored-by: Zhuohan Li <[email protected]> Co-authored-by: litianjian <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Charlie Fu <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Will Johnson <[email protected]> Co-authored-by: pavlo-ruban <[email protected]> Co-authored-by: Sam Stoelinga <[email protected]> Co-authored-by: ErkinSagiroglu <[email protected]> Co-authored-by: Vasiliy Alekseev <[email protected]> Co-authored-by: kakao-kevin-us <[email protected]> Co-authored-by: Kevin-Yang <[email protected]> Co-authored-by: 科英 <[email protected]> Co-authored-by: madt2709 <[email protected]> Co-authored-by: litianjian <[email protected]> Co-authored-by: Zhong Qishuai <[email protected]> Co-authored-by: tastelikefeet <[email protected]> Co-authored-by: Sven Seeberg <[email protected]> Co-authored-by: yannicks1 <[email protected]> Co-authored-by: Junichi Sato <[email protected]> Co-authored-by: Kunjan <[email protected]> Co-authored-by: Will Eaton <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: YiSheng5 <[email protected]> Co-authored-by: Went-Liang <[email protected]> Co-authored-by: Elfie Guo <[email protected]> Co-authored-by: Harsha vardhan manoj Bikki <[email protected]> Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: Mor Zusman <[email protected]> Co-authored-by: Prashant Gupta <[email protected]> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: André Jonasson <[email protected]> Co-authored-by: Pavani Majety <[email protected]> Co-authored-by: Gene Der Su <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Peter Salas <[email protected]> Co-authored-by: sroy745 <[email protected]> Co-authored-by: Michael Green <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Nikita Furin <[email protected]> Co-authored-by: shanshan wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Yang Zheng <[email protected]> Co-authored-by: Yang Zheng(SW)(Alex) <[email protected]> Co-authored-by: Tran Quang Dai <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: hissu-hyvarinen <[email protected]> Co-authored-by: lkchen <[email protected]> Co-authored-by: Linkun Chen <[email protected]> Co-authored-by: Linkun Chen <[email protected]> Co-authored-by: Gene Der Su <[email protected]>
This PR adds all commits before vllm-project#6143 without vllm-project#6143.
vllm/_core_ext.py
Outdated
@@ -131,10 +131,37 @@ def is_ieee_754(self) -> bool: | |||
not self._finite_values_only | |||
|
|||
def __str__(self) -> str: | |||
raise NotImplementedError | |||
""" | |||
naming generally follows: https://github.com/jax-ml/ml_dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Vllm was rebased and this methods do have those defitnitions now (its now under scalar_type.py name). Please rebase
vllm-project#6143 got merged, but it's based on an older revision of HPU components. This PR aligns the two.
) Accuracy fix for multi-step scheduling. This code solves the problem of a wrong second token when HPU Graphs are used. --------- Co-authored-by: Libin Tang <[email protected]>
@michalkuligowski, I did rebasing but seems like something might go wrong. Let me know if merging creates a problem. |
@maktukmak something went wrong with the rebase, you should not have all those commit visible. Maybe you should create a new PR on the latest habana_main. Also please extract hpu specific classes and push to vllm-hpu-extension as in #421 (comment) as this cant be merged into habana_main in present form. |
@michalkuligowski, I created a new PR #481 from habana_main. I also pushed hpu specific classes to vllm-hpu-extension (HabanaAI/vllm-hpu-extension#28). |
Closing this on, continued in #481 |
This PR enables loading GPTQ quantized models and running weight-only quantized inference on HPU.
Currently, it works only for BF16 inference due to kernel
torch.ops.hpu.convert_from_uint4
not supporting FP16.Tested on TheBloke/Llama-2-70B-GPTQ with TP=1, 2, 4, and all passed.