Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
TC llama recompile fix - no_grad to inference_mode (#640)
during warmup the inference mode is used, but at runtime it's overwritten by inference mode - this causes recompilations due to dispatch key mismatch in torch.compile. This switches the no_grad mode to inference_mode from base class. --------- Co-authored-by: Rafal Litka <[email protected]>
- Loading branch information