Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sp_token=32110 #28

Open
xie-qiang opened this issue Jan 23, 2024 · 6 comments
Open

sp_token=32110 #28

xie-qiang opened this issue Jan 23, 2024 · 6 comments

Comments

@xie-qiang
Copy link

Hello, I found that the SP_TOKEN should be set to 32110 in Demo, otherwise the Image Token cannot be replaced, resulting in poor results, thank you!

outputs = model.generate(
        pixel_values = inputs['pixel_values'],
        input_ids = inputs['input_ids'],
        attention_mask = inputs['attention_mask'],
        img_mask = inputs['img_mask'],
        do_sample=False,
        max_length=50,
        min_length=1,
        set_min_padding_size =False,
        sp_token = 32110
)
@Jianzhao-Huang
Copy link

Hello, I tried the method you mentioned, but encountered an error. Do you have any suggestions? Thank you very much!

Here is the complete error information.

shape mismatch leads to truncate. insert embedding tensor of shape torch.Size([96, 4096]) cannot be broadcast to replace placeholder of shape torch.Size([0, 4096])

{
	"name": "RuntimeError",
	"message": "torch.cat(): expected a non-empty list of Tensors",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[10], line 17
     14 inputs['pixel_values'] = inputs['pixel_values'].unsqueeze(0)
     16 inputs = inputs.to('cuda:0')
---> 17 outputs = model.generate(
     18         pixel_values = inputs['pixel_values'],
     19         input_ids = inputs['input_ids'],
     20         attention_mask = inputs['attention_mask'],
     21         img_mask = inputs['img_mask'],
     22         do_sample=False,
     23         max_length=50,
     24         min_length=1,
     25         set_min_padding_size =False,
     26         sp_token = 32110
     27 )
     28 generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0].strip()
     29 print(generated_text)

File ~/anaconda3/envs/mmicl/lib/python3.8/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/hjz/harmful_meme_detection/mmicl/model/instructblip/modeling_instructblip.py:2129, in InstructBlipForConditionalGeneration.generate(self, pixel_values, qformer_input_ids, qformer_attention_mask, input_ids, attention_mask, img_mask, set_min_padding_size, sp_token, **generate_kwargs)
   2126         index+= i_count*img_token_szie
   2127     img_idx +=1
-> 2129 insert_embeds = torch.concat(insert_embeds_list, dim=0)
   2130 try:
   2131     inputs_embeds[image_embeds_index] = insert_embeds

RuntimeError: torch.cat(): expected a non-empty list of Tensors"
}

@mhd0528
Copy link

mhd0528 commented Oct 5, 2024

Hi, did you solve the empty tensor issue? Thanks in advance!

@Sunzz1996
Copy link

I met the same issue. Did you solve the empty tensor issue? Thanks in advance!
shape mismatch leads to truncate. insert embedding tensor of shape torch.Size([160, 4096]) cannot be broadcast to replace placeholder of shape torch.Size([0, 4096])

@mhd0528
Copy link

mhd0528 commented Jan 2, 2025

Hi, the problem for me was that Huggingface updated their Instruction-Blip model to support images and videos. Basically, they add two new tokens for image and video.
With this change the sp_token for the customized model should be 32102 instead of 32100 before. I added the following code to get the correct special_token_id from the processor:

sp_token_id = processor.tokenizer.convert_tokens_to_ids(image_placeholder)
processor.tokenizer.img_place_token_id = sp_token_id
print(f"Special tokens id for '{image_placeholder}': {sp_token_id}. Add to processor.")

Hope this helps!

@Sunzz1996
Copy link

Sunzz1996 commented Jan 3, 2025

Thank you so much for your warm help. I added your code in example.ipynb file, but it still doesn't work with the same error.

`# For T5 based model
from model.instructblip import InstructBlipConfig, InstructBlipModel, InstructBlipPreTrainedModel,InstructBlipForConditionalGeneration,InstructBlipProcessor
import datasets
import json
import transformers
from PIL import Image
import torch
model_type="instructblip"
model_ckpt="/data/llm-models/MMICL-Instructblip-T5-xxl"
processor_ckpt = "/data/llm-models/instructblip-flan-t5-xxl"
config = InstructBlipConfig.from_pretrained(model_ckpt )

if 'instructblip' in model_type:
model = InstructBlipForConditionalGeneration.from_pretrained(
model_ckpt,
config=config).to('cuda:6',dtype=torch.bfloat16)

image_palceholder="图"
sp = [image_palceholder]+[f"<image{i}>" for i in range(20)]
processor = InstructBlipProcessor.from_pretrained(
processor_ckpt
)

##modify the sp_token_id
sp_token_id = processor.tokenizer.convert_tokens_to_ids(image_palceholder)
processor.tokenizer.img_place_token_id = sp_token_id
print(f"Special tokens id for '{image_palceholder}': {sp_token_id}. Add to processor.")

sp = sp+processor.tokenizer.additional_special_tokens[len(sp):]
processor.tokenizer.add_special_tokens({'additional_special_tokens':sp})

if model.qformer.embeddings.word_embeddings.weight.shape[0] != len(processor.qformer_tokenizer):
model.qformer.resize_token_embeddings(len(processor.qformer_tokenizer))
replace_token="".join(32*[image_palceholder])`

@mhd0528
Copy link

mhd0528 commented Jan 3, 2025

I remember another possible reason is the size of the image. But if you're using the example notebook that shouldn't be the issue here? I can't run the model now, but I'll share it here if it's not working for me again or I find another fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants