Add LLava ONNX export #1790

mht-sharma · 2024-04-02T08:35:26Z

What does this PR do?

As per title!

Issue: (#1751)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-04-02T09:11:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova · 2024-04-04T00:33:26Z

optimum/exporters/onnx/model_patcher.py

+            if config._behavior == "encoder":
+                inputs_embeds = model.get_input_embeddings()(input_ids)
+
+                image_outputs = model.vision_tower(pixel_values, output_hidden_states=True)
+                selected_image_feature = image_outputs.hidden_states[vision_feature_layer]
+
+                if vision_feature_select_strategy == "default":
+                    selected_image_feature = selected_image_feature[:, 1:]
+                elif vision_feature_select_strategy == "full":
+                    selected_image_feature = selected_image_feature
+                else:
+                    raise ValueError(f"Unexpected select feature strategy: {vision_feature_select_strategy}")
+
+                image_features = model.multi_modal_projector(selected_image_feature)
+                inputs_embeds, attention_mask, labels, position_ids = model._merge_input_ids_with_image_features(
+                    image_features, inputs_embeds, input_ids, attention_mask, None
+                )
+
+                result = {
+                    "inputs_embeds": inputs_embeds,
+                    "decoder_attention_mask": attention_mask,
+                    "position_ids": position_ids,
+                }


I might not be understanding this 100%, but won't this be problematic for generation? We would need to re-pass the image features on every forward pass, which will merge the ids every time. This also means that we cannot embed a single text token (e.g., the one just generated).

Here's an example of a hand-crafted version of a tiny random LlavaForConditionalGeneration: https://huggingface.co/Xenova/tiny-random-LlavaForConditionalGeneration. There are 3 models exported:

embed_tokens.onnx - just the token embedding layer

decoder_model_merged.onnx - the causal LM

vision_encoder.onnx - the vision encoder

I've got this working with Transformers.js (v3), where the concatenation of the token/vision patch embeddings are done in JavaScript.

Hi @xenova, it should not be a problem for generation.

I generate the following three models:

encoder_model.onnx - token embed + vision tower + projection + merging

decoder_model.onnx - Language model only (The export is same as current decoder export in optimum)

decoder_input_processor.onnx - token embed + decoder input generation when past_key_values is available. (The elif part in the modeling code)

The naming of models could possibly be updated.

This is how I use the models for inference: https://gist.github.com/mht-sharma/290f7bf9052e92023b4136c6fefd6717

ONNX Model: https://huggingface.co/mohitsha/llava-1.5-7b-hf/tree/main

In this version:

I do all calculations as part of ONNX.

The embedding model is duplicated but is comparatively small. If we want we could have additional 2 options for this part:
a. Create a separate embed_model.onnx and rest same. Now we have 4 ONNX models.
b. Create a separate embed_model.onnx and do the past_key_value stage attention_mask and position_ids processing as part of python code and remove decoder_input_processor.onnx

Let me know WDYT and if you have any suggestions.

xenova · 2024-04-04T15:53:52Z

It might also be a good idea to generalize for other image-text-to-text models. For example, vikhyatk/moondream2 which is quite similar (or others that are actually supported by transformers).

Pengjie-W · 2024-05-22T11:23:19Z

Could you please give me the code for converting llava into onnx

Pengjie-W · 2024-05-22T11:55:49Z

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Could you please give me the code for converting llava into onnx

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Pengjie-W · 2024-05-22T12:31:00Z

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Could you please give me the code for converting llava into onnx

Because I'm going to make an error, RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3

Traceback (most recent call last):
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 577, in export_pytorch
onnx_export(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1596, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 1285, in _get_trace_graph
outs = ONNXTracedModule(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 133, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/jit/_trace.py", line 124, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py", line 589, in patched_forward
outputs = model.language_model(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1035, in forward
attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa
expanded_4d_mask = attn_mask_converter.to_4d(
File "/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (4112) must match the size of tensor b (32) at non-singleton dimension 3
python-BaseException

Pengjie-W · 2024-05-22T14:12:58Z

I'm running this
optimum-cli export onnx --model llava-hf/llava-1.5-7b-hf llava_onnx/ --task image-to-text-with-past --trust-remote-code
Reported error

mht-sharma · 2024-05-24T08:14:16Z

Hi @Pengjie-W I will have a look later today or Monday!

mht-sharma · 2024-05-28T10:37:12Z

@Pengjie-W onnxruntime-1.17.3 seems to work. Could you please give it a try?

EDIT: The latest commit fixes the export for ORT 1.18 too.

zhangyu68 · 2024-08-16T12:48:19Z

how can I export a onnx model by llava-1.5-7b-hf?
when I run this command and then get an error:
optimum-cli export onnx --model /workspace/[email protected]/original_models/llava-1.5-7b-hf onnx_model/llava-v1.5-7b --task image-to-text-with-past --trust-remote-code
Traceback (most recent call last): File "/opt/conda/bin/optimum-cli", line 8, in <module> sys.exit(main()) File "/opt/conda/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 208, in main service.run() File "/opt/conda/lib/python3.10/site-packages/optimum/commands/export/onnx.py", line 265, in run main_export( File "/opt/conda/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 365, in main_export onnx_export_from_model( File "/opt/conda/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 1048, in onnx_export_from_model raise ValueError( ValueError: Trying to export a llava model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models for an example on how to export custom models. Please open an issue at https://github.com/huggingface/optimum/issues if you would like the model type llava to be supported natively in the ONNX export.

environment：
onnx 1.16.1
onnxruntime-gpu 1.18.1
opencv-python 4.10.0.84

github-actions · 2025-01-06T02:05:32Z

This PR has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

mht-sharma added 5 commits April 1, 2024 20:35

add llava

1c978b1

fix errors

87fc73e

fix errors

52e7b7b

add llava patcher

d94a6ba

update patcher

4f8a812

mht-sharma added 2 commits April 3, 2024 10:25

add variant

37c2315

add variant

c095cf1

xenova reviewed Apr 4, 2024

View reviewed changes

update variant

7adb60a

fxmarty mentioned this pull request Apr 16, 2024

Support Llava ONNX export #1813

Open

4 tasks

mht-sharma added 2 commits May 29, 2024 10:56

fix compatibility with ort 1.18

9e6292a

Merge remote-tracking branch 'upstream/main' into add_llava

78c4287

mht-sharma marked this pull request as ready for review June 3, 2024 14:16

mht-sharma changed the title ~~[WIP] Add LLava ONNX export~~ Add LLava ONNX export Jun 3, 2024

github-actions bot added the Stale label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLava ONNX export #1790

Add LLava ONNX export #1790

mht-sharma commented Apr 2, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 2, 2024

xenova Apr 4, 2024

mht-sharma Apr 4, 2024 •

edited

Loading

xenova commented Apr 4, 2024 •

edited

Loading

Pengjie-W commented May 22, 2024

Pengjie-W commented May 22, 2024

Pengjie-W commented May 22, 2024

Pengjie-W commented May 22, 2024

mht-sharma commented May 24, 2024

mht-sharma commented May 28, 2024 •

edited

Loading

zhangyu68 commented Aug 16, 2024

github-actions bot commented Jan 6, 2025

Add LLava ONNX export #1790

Are you sure you want to change the base?

Add LLava ONNX export #1790

Conversation

mht-sharma commented Apr 2, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Apr 2, 2024

xenova Apr 4, 2024

Choose a reason for hiding this comment

mht-sharma Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

xenova commented Apr 4, 2024 • edited Loading

Pengjie-W commented May 22, 2024

Pengjie-W commented May 22, 2024

Pengjie-W commented May 22, 2024

Pengjie-W commented May 22, 2024

mht-sharma commented May 24, 2024

mht-sharma commented May 28, 2024 • edited Loading

zhangyu68 commented Aug 16, 2024

github-actions bot commented Jan 6, 2025

mht-sharma commented Apr 2, 2024 •

edited

Loading

mht-sharma Apr 4, 2024 •

edited

Loading

xenova commented Apr 4, 2024 •

edited

Loading

mht-sharma commented May 28, 2024 •

edited

Loading