Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermediate structure XCLIP used for Recognition, how to retrieval #22

Open
Lucky-Light-Sun opened this issue Mar 20, 2024 · 1 comment

Comments

@Lucky-Light-Sun
Copy link

Hi,
I notice Intermediate structure XCLIP is used for RECOGNITION task and the official code is not used for retrieval task. So I want to ask how do you get the X-CLIP retrieval@1 metric? If you do the experiment by yourself, can you please give me the code? Or please give the refering paper and code.

Looking forward to your reply.

Best wishes!

image

@farewellthree
Copy link
Owner

The retrieval code for XCLIP is held by my previous company, but I have been away for a long time, making it difficult to access these codes. Additionally, the past code was based on MMCV1.0 and is incompatible with the current version. However, replicating it is simple. We did not utilize XCLIP's prompting and MIT modules, while only using the CCT module that inserts message tokens into the backbone. We only need to make slight modifications to the VIT block of CLIP, see the CrossFramelAttentionBlock here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants