Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you commit the latest code? #25

Open
payne4handsome opened this issue Dec 4, 2023 · 0 comments
Open

Do you commit the latest code? #25

payne4handsome opened this issue Dec 4, 2023 · 0 comments

Comments

@payne4handsome
Copy link

@HaozheZhao
Hi, HeozhaZhao, Thanks for your great job at MLLM. I cost a lot of time to run your code MIC. But I find some errors in your code.
First, just like bebow
image
I can't even find the function save_pred_label implement in dataset.py.
Besides, There two addiational problems.

  1. When train with jsonl data format that download from MIC_full and set done_preprocess==False, your code does't work. It prompt IterableDataset hasn't len method.
  2. When train with arrow data format I use data_preprocess.py to generate and use Flan-T5 as language model, I got dimension mismatch error. Becase T5's tokenizer.model_max_length is 512, The length of one sample with few-shot is much longer than 512 tokens. Your truncate the input_ids, resulting in image_placeholder (T5's image_placeholder is 图) after 512 tokens in input_ids also truncated. But image num in pixel value not truncated.

So, I think you didn't upload the latest code. Am I right? Or am I wrong about something.

Any suggestions will help me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant