Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: facebook/opt-125m is not supported yet #26

Open
andyhandsom6 opened this issue Nov 30, 2024 · 3 comments
Open

NotImplementedError: facebook/opt-125m is not supported yet #26

andyhandsom6 opened this issue Nov 30, 2024 · 3 comments

Comments

@andyhandsom6
Copy link

Hi. I'm walking through your README file, and I've decided to train the image stage:

sh scripts/train_image_qwen.sh

And this is where I encountered the problem: NotImplementedError: facebook/opt-125m is not supported yet.
I've followed the traceback and read through lines 836-854 in train.py. And it seems that "cambrian" should be included in input_model_filename. I think there is something to do with the bash script.
The initial script file's first line is:

BASE_CHECKPOINT="Qwen/Qwen2-7B-Instruct" # replace config.json with https://huggingface.co/Vision-CAIR/LongVU_Qwen2_7B_img/config.json

I'm not sure how to modify it to make it include the name "cambrian", since it seems that its intension is to lead to the LLM Qwen.

@xiaoqian-shen
Copy link
Collaborator

Hi, you can download the checkpoint and then rename it to include cambrian, or change the code where we choose the architecture depends on model name.

@gauravsh0812
Copy link

I have the same issue but am trying to run "sh scripts/train_video_qwen.sh". Can anyone please help me running this model?

@andyhandsom6
Copy link
Author

@xiaoqian-shen Thank you for your explanation. I also have some questions regarding the video dataset.
I've noticed that there's difference regarding the dataset selection in your paper on arxiv and in Hugging Face README file. For example:
paper mentioned WebVidQA in the Appendix section, but hf repo doesn't include this.
hf mentioned BDD100K, Panda, VideoChatGPT, which the paper doesn't include.
I'd appreciate it if you could help me with sorting out the datasets. which should I include in the target folder and which shouldn't?
Here's the full list that I've sorted out according to your paper and hf repo:

Captioning 43K

  • TextVR VideoChat2/textvr.zip
  • MovieChat VideoChat2/moviechat.zip
  • YouCook2 youcook_split_videos.zip.partaa

Classification 1K

  • Kinetics-710 VideoChat2/k400.zip

VQA 424K

  • NExTQA NExTVideo.zip
  • CLEVRER VideoChat2/clevrer_qa.zip
  • EgoQA egoqa_split_videos.zip
  • TGIF tgif.tar.gz
  • WebVidQA ?
  • DiDeMo VideoChat2/didemo.zip

Instruction 85K

  • ShareGPT4Video train_300k

Didn't mention in the paper

BDD100K: VideoChat2/bdd.zip
Panda: panda70m_training_2m.zip
VideoChatGPT: video-chatgpt-videos.tar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants