Loading compiled fails: `model_type=bert -> transformers` being used in compiled config. #744

michaelfeil · 2024-12-02T18:55:38Z

System Info

I am running the following code inside the following container (build by huggingface-optimum team)

763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04

import torch
from optimum.neuron import NeuronModelForFeatureExtraction # type: ignore
from transformers import AutoConfig, AutoTokenizer # type: ignore[import-untyped]

compiler_args = {"num_cores": get_nc_count(), "auto_cast_type": "fp16"}
input_shapes = {
"batch_size": 4,
"sequence_length": (
self.config.max_position_embeddings
if hasattr(self.config, "max_position_embeddings")
else 512
),
}
self.model = NeuronModelForFeatureExtraction.from_pretrained(
model_id="TaylorAI/bge-micro-v2", # BERT SMALL
revision=None,
trust_remote_code=True,
export=True,
**compiler_args,
**input_shapes,
)

Leads to the following error:
```python
INFO     2024-12-02 08:21:07,125 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: TaylorAI/bge-micro-v2                                                                                                                     SentenceTransformer.py:218
***** Compiling bge-micro-v2 *****
.
Compiler status PASS
[Compilation Time] 24.19 seconds.
[Total compilation Time] 24.19 seconds.
2024-12-02 08:21:34.000152:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-12-02 08:21:34.000154:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
Model cached in: /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84.
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 96, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 291, in from_args
    return cls(engines=tuple(engines))
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 70, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 55, in __init__
    self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 81, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/transformer/embedder/neuron.py", line 109, in __init__
    self.model = NeuronModelForFeatureExtraction.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 402, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 242, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 370, in _export
    return cls._from_pretrained(save_dir_path, config, model_save_dir=save_dir)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 201, in _from_pretrained
    neuron_config = cls._neuron_config_init(config) if neuron_config is None else neuron_config
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 468, in _neuron_config_init
    neuron_config_constructor = TasksManager.get_exporter_config_constructor(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 2033, in get_exporter_config_constructor
    model_tasks = TasksManager.get_supported_tasks_for_model_type(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 1245, in get_supported_tasks_for_model_type
    raise KeyError(
KeyError: "transformer is not supported yet for transformers. Only ['audio-spectrogram-transformer', 'albert', 'bart', 'beit', 'bert', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'convnextv2', 'cvt', 'data2vec-text', 'data2vec-vision', 'data2vec-audio', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'donut', 'donut-swin', 'dpt', 'electra', 'encoder-decoder', 'esm', 'falcon', 'flaubert', 'gemma', 'glpn', 'gpt2', 'gpt-bigcode', 'gptj', 'gpt-neo', 'gpt-neox', 'groupvit', 'hubert', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'lilt', 'levit', 'longt5', 'marian', 'markuplm', 'mbart', 'mistral', 'mobilebert', 'mobilevit', 'mobilenet-v1', 'mobilenet-v2', 'mpnet', 'mpt', 'mt5', 'musicgen', 'm2m-100', 'nystromformer', 'owlv2', 'owlvit', 'opt', 'qwen2', 'llama', 'pegasus', 'perceiver', 'phi', 'phi3', 'pix2struct', 'poolformer', 'regnet', 'resnet', 'roberta', 'roformer', 'sam', 'segformer', 'sew', 'sew-d', 'speech-to-text', 'speecht5', 'splinter', 'squeezebert', 'swin', 'swin2sr', 't5', 'table-transformer', 'trocr', 'unispeech', 'unispeech-sat', 'vision-encoder-decoder', 'vit', 'vits', 'wavlm', 'wav2vec2', 'wav2vec2-conformer', 'whisper', 'xlm', 'xlm-roberta', 'yolos', 't5-encoder', 't5-decoder', 'mixtral'] are supported for the library transformers. If you want to support transformer please propose a PR or open up an issue."

Analysis:

Compiling worked.
Model got saved /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84/config.json
inside /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84/config.json the model_type="transformer", but should be "bert"

Reproduction:
docker run -it --device /dev/neuron0 michaelf34/aws-neuron-base-img:inf-repro

root@c2fd099ea82b:/app# nano /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_79d2cd5b82fe880e7bef/
config.json              model.neuron             special_tokens_map.json  tokenizer.json           tokenizer_config.json    vocab.txt

# config.json
{
  "_name_or_path": "michaelfeil/bge-small-en-v1.5",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "export_model_type": "transformer",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "neuron": {
    "auto_cast": null,
    "auto_cast_type": null,
    "compiler_type": "neuronx-cc",
    "compiler_version": "2.14.227.0+2d4f85be",
    "disable_fast_relayout": false,
    "dynamic_batch_size": false,
    "inline_weights_to_neff": true,
    "input_names": [
      "input_ids",
      "attention_mask"
    ],
    "model_type": "transformer",
    "optlevel": "2",
    "output_attentions": false,
    "output_hidden_states": false,
    "output_names": [
      "token_embeddings",
      "sentence_embedding"
    ],
    "static_batch_size": 4,
    "static_sequence_length": 512
  },
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "task": "feature-extraction",
  "torch_dtype": "float32",
  "torchscript": true,
  "transformers_version": "4.41.1",

Also fails with same command with:

accelerate-0.23.0 optimum-1.18.1 optimum-neuron-0.0.22 tokenizers-0.15.2 transformers-4.36.2

Also fails with

optimum-1.23.* + optimum-neuron-0.0.26

Does not fail with same command with

optimum-1.17.1 + optimum-neuron-0.0.20



### Who can help?

(FYI, so you are in the loop @jimburtoft ) @JingyaHuang

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

Repro above

### Expected behavior

Fixed with:
- new neuron compiler 
- > requires new torch-neuronx and torch_xla
- > requires new protobuf
- > downgrade optimum neuron from neuron-containers: https://huggingface.co/docs/optimum-neuron/en/containers

pip3 install --upgrade neuronx-cc==2.15.* torch-neuronx torchvision transformers-neuronx libneuronxla protobuf optimum-neuron==0.0.20

The text was updated successfully, but these errors were encountered:

JingyaHuang · 2024-12-03T14:58:01Z

Hi @michaelfeil, I think there was a mismatch of the auto-detected library ("sentence transformers") and the class used for inference (NeuronModelForFeatureExtraction -> transformers).

The following code using NeuronModelForSentenceTransformers shall work unless you intend to use the model via the transformers library (if so we can open a PR and let you self-define the library to load the model)

import torch
from optimum.neuron import NeuronModelForSentenceTransformers # type: ignore
from transformers import AutoConfig, AutoTokenizer # type: ignore[import-untyped]

compiler_args = {"auto_cast": "matmul", "auto_cast_type": "fp16"}
input_shapes = {"batch_size": 4, "sequence_length": 512}
model = NeuronModelForSentenceTransformers.from_pretrained(
    model_id="TaylorAI/bge-micro-v2", # BERT SMALL
    export=True,
    **compiler_args,
    **input_shapes,
)

michaelfeil · 2024-12-03T16:56:58Z

@JingyaHuang I am not sure if I want that.
I integrated it here:

https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/transformer/embedder/neuron.py

On 1.18. that part no longer works & is a breaking change. Here is how to run it: https://github.com/michaelfeil/infinity/tree/main/infra/aws_neuron

github-actions · 2025-01-03T08:04:48Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

JingyaHuang · 2025-01-06T17:16:49Z

Not stale.

JingyaHuang · 2025-01-06T17:17:24Z

I see @michaelfeil, will open a PR to put back the support w/o. sentence transformers, thanks for reporting.

JingyaHuang · 2025-01-07T14:18:02Z

Hi @michaelfeil, I opened a pull request here: #756, could you check if this fixes the issue? Thx.

michaelfeil · 2025-01-07T14:39:46Z

Thanks, will look into it!

JingyaHuang · 2025-01-09T12:33:42Z

Hi @michaelfeil, I just merged the fix. Let me know if it works and feel free to to reopen if there is any further questions. Thx :D !

michaelfeil added the bug Something isn't working label Dec 2, 2024

michaelfeil mentioned this issue Dec 2, 2024

Loading compiled fails: model_type=bert -> transformers being used in compiled config. aws-neuron/transformers-neuronx#102

Open

dacorvo assigned JingyaHuang Dec 3, 2024

github-actions bot added the Stale label Jan 3, 2025

JingyaHuang removed the Stale label Jan 6, 2025

JingyaHuang mentioned this issue Jan 7, 2025

Fix emb model export and load with trfrs #756

Merged

JingyaHuang closed this as completed in fe71b3c Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading compiled fails: `model_type=bert -> transformers` being used in compiled config. #744

Loading compiled fails: `model_type=bert -> transformers` being used in compiled config. #744

michaelfeil commented Dec 2, 2024

JingyaHuang commented Dec 3, 2024

michaelfeil commented Dec 3, 2024

github-actions bot commented Jan 3, 2025

JingyaHuang commented Jan 6, 2025

JingyaHuang commented Jan 6, 2025

JingyaHuang commented Jan 7, 2025

michaelfeil commented Jan 7, 2025

JingyaHuang commented Jan 9, 2025

Loading compiled fails: model_type=bert -> transformers being used in compiled config. #744

Loading compiled fails: model_type=bert -> transformers being used in compiled config. #744

Comments

michaelfeil commented Dec 2, 2024

System Info

Also fails with same command with:

Also fails with

Does not fail with same command with

JingyaHuang commented Dec 3, 2024

michaelfeil commented Dec 3, 2024

github-actions bot commented Jan 3, 2025

JingyaHuang commented Jan 6, 2025

JingyaHuang commented Jan 6, 2025

JingyaHuang commented Jan 7, 2025

michaelfeil commented Jan 7, 2025

JingyaHuang commented Jan 9, 2025

Loading compiled fails: `model_type=bert -> transformers` being used in compiled config. #744

Loading compiled fails: `model_type=bert -> transformers` being used in compiled config. #744