-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove attn mask patching #1473
Conversation
Not sure if this context is given anywhere in the code base, but anyway: That's great @baskrahmer, thank you for the simplification! For context, @echarlaix introduced a simplification for the ONNX export of decoder-only models in #1257, where a single ONNX without subgraphs can be used, handling both prefill and decode steps (contrary to the previous However, to do that, the traced model during the ONNX export needs to encompass the causal mask generation. Unfortunately, some architectures as llama https://github.com/huggingface/transformers/blob/fc142bd775ae4639f80a8b0085a5df33bd2853ce/src/transformers/models/llama/modeling_llama.py#L139-L147. So to export models with the new structure, we either need to patch the models to remove this controlflow (what was done), or simply use |
eab6299
to
30a922c
Compare
@fxmarty thanks for the context. Sounds sensible :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test a bit more later!
Hi @baskrahmer, following huggingface/transformers#27086 quite a few |
see #1495 |
Not sure if I follow - you mean also removing this? |
1c1a4be
to
2df564d
Compare
Hi @baskrahmer, sorry for the late reply. I meant this: optimum/optimum/exporters/onnx/model_patcher.py Lines 407 to 411 in 1aee8ff
EDIT: Nevermind, you already removed it! I'm preparing a release for today in sync with Transformers release and we'll need this PR in, for the interest of time I'll be pushing to your branch to get this PR merged, apology in advance about that! |
@fxmarty thanks for the reply. All good, you are definitely more in the details here so feel free to change anything :) |
@baskrahmer #1509 is merged based off your branch (I could not push to your branch), sorry for the hurry and thank you for your contribution! |
What does this PR do?
Removes attention mask patching for specific models when doing an ONNX export.
Picked this up but I am not sure about:
sequence_length=1
and also not what the exact scope is for such an action. Right now it raises a warning for any models that have tasks prefixed withtext-generation
. Maybe this should be more specific.Fixes #1461