Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading compiled fails: model_type=bert -> transformers being used in compiled config. #744

Closed
michaelfeil opened this issue Dec 2, 2024 · 8 comments · Fixed by #756
Closed
Assignees
Labels
bug Something isn't working

Comments

@michaelfeil
Copy link

System Info

I am running the following code inside the following container (build by huggingface-optimum team)

763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04

import torch
from optimum.neuron import NeuronModelForFeatureExtraction # type: ignore
from transformers import AutoConfig, AutoTokenizer # type: ignore[import-untyped]

compiler_args = {"num_cores": get_nc_count(), "auto_cast_type": "fp16"}
input_shapes = {
"batch_size": 4,
"sequence_length": (
self.config.max_position_embeddings
if hasattr(self.config, "max_position_embeddings")
else 512
),
}
self.model = NeuronModelForFeatureExtraction.from_pretrained(
model_id="TaylorAI/bge-micro-v2", # BERT SMALL
revision=None,
trust_remote_code=True,
export=True,
**compiler_args,
**input_shapes,
)

Leads to the following error:
```python
INFO     2024-12-02 08:21:07,125 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: TaylorAI/bge-micro-v2                                                                                                                     SentenceTransformer.py:218
***** Compiling bge-micro-v2 *****
.
Compiler status PASS
[Compilation Time] 24.19 seconds.
[Total compilation Time] 24.19 seconds.
2024-12-02 08:21:34.000152:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-12-02 08:21:34.000154:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
Model cached in: /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84.
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 96, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 291, in from_args
    return cls(engines=tuple(engines))
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 70, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 55, in __init__
    self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 81, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/transformer/embedder/neuron.py", line 109, in __init__
    self.model = NeuronModelForFeatureExtraction.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 402, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 242, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 370, in _export
    return cls._from_pretrained(save_dir_path, config, model_save_dir=save_dir)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 201, in _from_pretrained
    neuron_config = cls._neuron_config_init(config) if neuron_config is None else neuron_config
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 468, in _neuron_config_init
    neuron_config_constructor = TasksManager.get_exporter_config_constructor(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 2033, in get_exporter_config_constructor
    model_tasks = TasksManager.get_supported_tasks_for_model_type(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 1245, in get_supported_tasks_for_model_type
    raise KeyError(
KeyError: "transformer is not supported yet for transformers. Only ['audio-spectrogram-transformer', 'albert', 'bart', 'beit', 'bert', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'convnextv2', 'cvt', 'data2vec-text', 'data2vec-vision', 'data2vec-audio', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'donut', 'donut-swin', 'dpt', 'electra', 'encoder-decoder', 'esm', 'falcon', 'flaubert', 'gemma', 'glpn', 'gpt2', 'gpt-bigcode', 'gptj', 'gpt-neo', 'gpt-neox', 'groupvit', 'hubert', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'lilt', 'levit', 'longt5', 'marian', 'markuplm', 'mbart', 'mistral', 'mobilebert', 'mobilevit', 'mobilenet-v1', 'mobilenet-v2', 'mpnet', 'mpt', 'mt5', 'musicgen', 'm2m-100', 'nystromformer', 'owlv2', 'owlvit', 'opt', 'qwen2', 'llama', 'pegasus', 'perceiver', 'phi', 'phi3', 'pix2struct', 'poolformer', 'regnet', 'resnet', 'roberta', 'roformer', 'sam', 'segformer', 'sew', 'sew-d', 'speech-to-text', 'speecht5', 'splinter', 'squeezebert', 'swin', 'swin2sr', 't5', 'table-transformer', 'trocr', 'unispeech', 'unispeech-sat', 'vision-encoder-decoder', 'vit', 'vits', 'wavlm', 'wav2vec2', 'wav2vec2-conformer', 'whisper', 'xlm', 'xlm-roberta', 'yolos', 't5-encoder', 't5-decoder', 'mixtral'] are supported for the library transformers. If you want to support transformer please propose a PR or open up an issue."

Analysis:

  • Compiling worked.
  • Model got saved /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84/config.json
  • inside /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84/config.json the model_type="transformer", but should be "bert"

Reproduction:
docker run -it --device /dev/neuron0 michaelf34/aws-neuron-base-img:inf-repro

root@c2fd099ea82b:/app# nano /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_79d2cd5b82fe880e7bef/
config.json              model.neuron             special_tokens_map.json  tokenizer.json           tokenizer_config.json    vocab.txt  
# config.json
{
  "_name_or_path": "michaelfeil/bge-small-en-v1.5",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "export_model_type": "transformer",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "neuron": {
    "auto_cast": null,
    "auto_cast_type": null,
    "compiler_type": "neuronx-cc",
    "compiler_version": "2.14.227.0+2d4f85be",
    "disable_fast_relayout": false,
    "dynamic_batch_size": false,
    "inline_weights_to_neff": true,
    "input_names": [
      "input_ids",
      "attention_mask"
    ],
    "model_type": "transformer",
    "optlevel": "2",
    "output_attentions": false,
    "output_hidden_states": false,
    "output_names": [
      "token_embeddings",
      "sentence_embedding"
    ],
    "static_batch_size": 4,
    "static_sequence_length": 512
  },
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "task": "feature-extraction",
  "torch_dtype": "float32",
  "torchscript": true,
  "transformers_version": "4.41.1",

Also fails with same command with:

accelerate-0.23.0 optimum-1.18.1 optimum-neuron-0.0.22 tokenizers-0.15.2 transformers-4.36.2

Also fails with

optimum-1.23.* + optimum-neuron-0.0.26

Does not fail with same command with

optimum-1.17.1 + optimum-neuron-0.0.20


### Who can help?

(FYI, so you are in the loop @jimburtoft ) @JingyaHuang

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

Repro above

### Expected behavior

Fixed with:
- new neuron compiler 
- > requires new torch-neuronx and torch_xla
- > requires new protobuf
- > downgrade optimum neuron from neuron-containers: https://huggingface.co/docs/optimum-neuron/en/containers

pip3 install --upgrade neuronx-cc==2.15.* torch-neuronx torchvision transformers-neuronx libneuronxla protobuf optimum-neuron==0.0.20

@JingyaHuang
Copy link
Collaborator

Hi @michaelfeil, I think there was a mismatch of the auto-detected library ("sentence transformers") and the class used for inference (NeuronModelForFeatureExtraction -> transformers).

The following code using NeuronModelForSentenceTransformers shall work unless you intend to use the model via the transformers library (if so we can open a PR and let you self-define the library to load the model)

import torch
from optimum.neuron import NeuronModelForSentenceTransformers # type: ignore
from transformers import AutoConfig, AutoTokenizer # type: ignore[import-untyped]

compiler_args = {"auto_cast": "matmul", "auto_cast_type": "fp16"}
input_shapes = {"batch_size": 4, "sequence_length": 512}
model = NeuronModelForSentenceTransformers.from_pretrained(
    model_id="TaylorAI/bge-micro-v2", # BERT SMALL
    export=True,
    **compiler_args,
    **input_shapes,
)

@michaelfeil
Copy link
Author

@JingyaHuang I am not sure if I want that.
I integrated it here:

https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/transformer/embedder/neuron.py

On 1.18. that part no longer works & is a breaking change. Here is how to run it: https://github.com/michaelfeil/infinity/tree/main/infra/aws_neuron

Copy link

github-actions bot commented Jan 3, 2025

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 3, 2025
@JingyaHuang
Copy link
Collaborator

Not stale.

@JingyaHuang
Copy link
Collaborator

I see @michaelfeil, will open a PR to put back the support w/o. sentence transformers, thanks for reporting.

@JingyaHuang
Copy link
Collaborator

Hi @michaelfeil, I opened a pull request here: #756, could you check if this fixes the issue? Thx.

@michaelfeil
Copy link
Author

Thanks, will look into it!

@JingyaHuang
Copy link
Collaborator

Hi @michaelfeil, I just merged the fix. Let me know if it works and feel free to to reopen if there is any further questions. Thx :D !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants