Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetuning problem with evfsam2 #26

Open
vvvvvjdy opened this issue Sep 23, 2024 · 8 comments
Open

finetuning problem with evfsam2 #26

vvvvvjdy opened this issue Sep 23, 2024 · 8 comments

Comments

@vvvvvjdy
Copy link

Nice work of extending sams ability to text guidied!
We have used your evfsam1 as our baseline to a new area and showd a signifcant performance. However, when we finetuned your evfsam2, it was easy to overfiiting(didnt see in evfsam1).
Did you met the same prob when your fintune sam2 ? or some hyperpramters different from sam1?
Hope to recieve your suggestion!

@CoderZhangYx
Copy link
Collaborator

CoderZhangYx commented Sep 23, 2024

The main differences between sam1 and sam2 lie in:

  1. pre-process: sam2 uses resize(1024), while sam1 uses resizelongest(1024) + padding.
  2. sam2 uses hierachical image encoder and sam1 uses ViT
  3. sam2 applies skip-connection to mask decoder.

Might these differences affect your training?

@yi-ming-qian
Copy link

@vvvvvjdy can you share your fine-tuning script? many thanks.

@vvvvvjdy
Copy link
Author

vvvvvjdy commented Dec 4, 2024 via email

@CoderZhangYx
Copy link
Collaborator

The augmentations influence the model performance in another way. In referring segmentation tasks, text prompts contain geometric words like "on the left". Once flipping or cropping or some other augmentations are applied, the prompts would fail. So only non-geometric augmentations are recommended.

@vvvvvjdy
Copy link
Author

vvvvvjdy commented Dec 5, 2024

@CoderZhangYx Quite agree with this statement. But even without such prompts in my finetuning data, the stronger data augmentation than the pretraining may cause this problem (some works have demonstrate the small-size model slike resnet-18 are especially sensitive to aug),I m shocked that such a large foundation model sam2 has such characteristics.

@CoderZhangYx
Copy link
Collaborator

That's amazing. What aug did you use? Could it be the reason that the aug didn't applied to the source fed to multi_model_extractor? Curious about this bug, honestly.

@vvvvvjdy
Copy link
Author

vvvvvjdy commented Dec 6, 2024

@CoderZhangYx I originally use large scale jittering(strong) for the input image(both beit and sam1 or 2) and gtmask, and found it works well on evfsam1 but not on evfsam2. Did you use the same aug for evfsam1 and 2, and which aug did you use ? ( it not mentioned in the paper)?

@CoderZhangYx
Copy link
Collaborator

In fact we use no aug when training our model. It is so strange that scale jittering affect performance of sam2. Inform me if you find out any other reasons, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants