TC llama recompile fix - no_grad to inference_mode #640

RafLit · 2024-12-17T08:33:12Z

during warmup the inference mode is used, but at runtime it's overwritten by inference mode - this causes recompilations due to dispatch key mismatch in torch.compile.
This switches the no_grad mode to inference_mode from base class.

Kacper-Pietkun

LGTM - this speeds up execution of Llama3.1 70b

jczaja

LGTM

madamczykhabana

LGTM

switch no_grad to inference_mode

a6a32ac

RafLit requested review from kzawora-intel, madamczykhabana, michalkuligowski and mgawarkiewicz as code owners December 17, 2024 08:33

remove unused torch import

58a9f90

RafLit requested review from Kacper-Pietkun, jczaja and anko-intel December 17, 2024 09:08

Kacper-Pietkun approved these changes Dec 17, 2024

View reviewed changes

jczaja approved these changes Dec 18, 2024

View reviewed changes

madamczykhabana approved these changes Dec 18, 2024

View reviewed changes

madamczykhabana merged commit d81f829 into habana_main Dec 18, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TC llama recompile fix - no_grad to inference_mode #640

TC llama recompile fix - no_grad to inference_mode #640

RafLit commented Dec 17, 2024 •

edited by github-actions bot

Loading

Kacper-Pietkun left a comment

jczaja left a comment

madamczykhabana left a comment

TC llama recompile fix - no_grad to inference_mode #640

TC llama recompile fix - no_grad to inference_mode #640

Conversation

RafLit commented Dec 17, 2024 • edited by github-actions bot Loading

Kacper-Pietkun left a comment

Choose a reason for hiding this comment

jczaja left a comment

Choose a reason for hiding this comment

madamczykhabana left a comment

Choose a reason for hiding this comment

RafLit commented Dec 17, 2024 •

edited by github-actions bot

Loading