Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLava ONNX export #1790

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

mht-sharma
Copy link
Contributor

@mht-sharma mht-sharma commented Apr 2, 2024

What does this PR do?

As per title!

Issue: (#1751)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines 530 to 552
if config._behavior == "encoder":
inputs_embeds = model.get_input_embeddings()(input_ids)

image_outputs = model.vision_tower(pixel_values, output_hidden_states=True)
selected_image_feature = image_outputs.hidden_states[vision_feature_layer]

if vision_feature_select_strategy == "default":
selected_image_feature = selected_image_feature[:, 1:]
elif vision_feature_select_strategy == "full":
selected_image_feature = selected_image_feature
else:
raise ValueError(f"Unexpected select feature strategy: {vision_feature_select_strategy}")

image_features = model.multi_modal_projector(selected_image_feature)
inputs_embeds, attention_mask, labels, position_ids = model._merge_input_ids_with_image_features(
image_features, inputs_embeds, input_ids, attention_mask, None
)

result = {
"inputs_embeds": inputs_embeds,
"decoder_attention_mask": attention_mask,
"position_ids": position_ids,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not be understanding this 100%, but won't this be problematic for generation? We would need to re-pass the image features on every forward pass, which will merge the ids every time. This also means that we cannot embed a single text token (e.g., the one just generated).

Here's an example of a hand-crafted version of a tiny random LlavaForConditionalGeneration: https://huggingface.co/Xenova/tiny-random-LlavaForConditionalGeneration. There are 3 models exported:

I've got this working with Transformers.js (v3), where the concatenation of the token/vision patch embeddings are done in JavaScript.

Copy link
Contributor Author

@mht-sharma mht-sharma Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xenova, it should not be a problem for generation.

I generate the following three models:

  1. encoder_model.onnx - token embed + vision tower + projection + merging
  2. decoder_model.onnx - Language model only (The export is same as current decoder export in optimum)
  3. decoder_input_processor.onnx - token embed + decoder input generation when past_key_values is available. (The elif part in the modeling code)

The naming of models could possibly be updated.

This is how I use the models for inference: https://gist.github.com/mht-sharma/290f7bf9052e92023b4136c6fefd6717

ONNX Model: https://huggingface.co/mohitsha/llava-1.5-7b-hf/tree/main

In this version:

  1. I do all calculations as part of ONNX.
  2. The embedding model is duplicated but is comparatively small. If we want we could have additional 2 options for this part:
    a. Create a separate embed_model.onnx and rest same. Now we have 4 ONNX models.
    b. Create a separate embed_model.onnx and do the past_key_value stage attention_mask and position_ids processing as part of python code and remove decoder_input_processor.onnx

Let me know WDYT and if you have any suggestions.

@xenova
Copy link
Contributor

xenova commented Apr 4, 2024

It might also be a good idea to generalize for other image-text-to-text models. For example, vikhyatk/moondream2 which is quite similar (or others that are actually supported by transformers).

@fxmarty fxmarty mentioned this pull request Apr 16, 2024
4 tasks
@Pengjie-W
Copy link

Could you please give me the code for converting llava into onnx

@Pengjie-W
Copy link

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Could you please give me the code for converting llava into onnx

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

@Pengjie-W
Copy link

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Could you please give me the code for converting llava into onnx

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Traceback (most recent call last):
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 577, in export_pytorch
onnx_export(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1596, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 1285, in _get_trace_graph
outs = ONNXTracedModule(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 133, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 124, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py", line 589, in patched_forward
outputs = model.language_model(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1035, in forward
attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa
expanded_4d_mask = attn_mask_converter.to_4d(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3
python-BaseException

@Pengjie-W
Copy link

I'm running this
optimum-cli export onnx --model llava-hf/llava-1.5-7b-hf llava_onnx/ --task image-to-text-with-past --trust-remote-code
Reported error

@mht-sharma
Copy link
Contributor Author

Hi @Pengjie-W I will have a look later today or Monday!

@mht-sharma
Copy link
Contributor Author

mht-sharma commented May 28, 2024

@Pengjie-W onnxruntime-1.17.3 seems to work. Could you please give it a try?

EDIT: The latest commit fixes the export for ORT 1.18 too.

@mht-sharma mht-sharma marked this pull request as ready for review June 3, 2024 14:16
@mht-sharma mht-sharma changed the title [WIP] Add LLava ONNX export Add LLava ONNX export Jun 3, 2024
@zhangyu68
Copy link

how can I export a onnx model by llava-1.5-7b-hf?
when I run this command and then get an error:
optimum-cli export onnx --model /workspace/[email protected]/original_models/llava-1.5-7b-hf onnx_model/llava-v1.5-7b --task image-to-text-with-past --trust-remote-code
Traceback (most recent call last): File "/opt/conda/bin/optimum-cli", line 8, in <module> sys.exit(main()) File "/opt/conda/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 208, in main service.run() File "/opt/conda/lib/python3.10/site-packages/optimum/commands/export/onnx.py", line 265, in run main_export( File "/opt/conda/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 365, in main_export onnx_export_from_model( File "/opt/conda/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 1048, in onnx_export_from_model raise ValueError( ValueError: Trying to export a llava model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models for an example on how to export custom models. Please open an issue at https://github.com/huggingface/optimum/issues if you would like the model type llava to be supported natively in the ONNX export.

environment:
onnx 1.16.1
onnxruntime-gpu 1.18.1
opencv-python 4.10.0.84

Copy link

github-actions bot commented Jan 6, 2025

This PR has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants