-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with converting vlm model "InternVL2_5-1B" #1073
Comments
@endomorphosis please try to uninstall flash_attention_2 package |
I would, however that is not likely going to work with the design parameters of my project, e.g. a peer to peer model server and endpoint aggregator, that is platform agnostic and model agnostic |
if this is indeed meant to be treated like an executable, why not just have all of the dependencies for this executable baked in, so that this sort of package conflict wont happen? |
Hi @endomorphosis, is there already a solution to this? I encountered the same problem during conversion of the jina-embedding-v3 model |
I did not find a solution to fixing this CLI command, and otherwise there is another means of converting the model, whereby openvino traces the torchscript code that is evaluated, before converting it to openvino IR. see e.g. https://github.com/endomorphosis/ipfs_accelerate_py/blob/212c5ad39db2f8d60c3e0230f0025e25c72cf6c2/ipfs_accelerate_py/worker/openvino_utils.py#L197 |
unfortunately, this is impossible because optimum is a flexible and configurable tool that follows common huggingface design practice with lazy initialization and delayed requirements installation including allowance to use remote code. So we can not predict which models which additional packages will require to install all of them simultaneously, in the same time it also may lead to UX problems if all known models dependencies will be installed even user do not need them (e.g. if you only need to run for example bert, there is no need to install dependencies for stable diffusion for example). The only think that I can recommend is to change attention implementation (in case of internvl code, it always forces flash_attn implementation if this package available in the environment) or ask model authors to fix model. From my side, I only can try to patch model automatically change attention implementation inside tool. |
@Florianoli could you please provide command that you use for export model with optimum-intel? |
I can see why someone would trade off alot of bloat to save time, but this seems like alot of bloat, I understand that not every model architecture is the same, but llama_cpp doesn't need to dynamically import dependencies in order to quantize the models, and if i remember correctly they do specifically list the model model architecture data in their conversion tool, instead of relying on huggingface libraries and tracing torchscript., |
@eaidova |
devel@workstation:/tmp$ optimum-cli export openvino --model OpenGVLab/InternVL2_5-1B InternVL2_5-1B --trust-remote-code >> /tmp/openvino.txt
The text was updated successfully, but these errors were encountered: