Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where Use the cos_sim between the Learned Text Embeddings and CLIP Text Embeddings? #2

Open
Key-lei opened this issue Jan 3, 2024 · 3 comments

Comments

@Key-lei
Copy link

Key-lei commented Jan 3, 2024

Thanks for this interesting work.

This paper uses cos_sim to compute the simliarity between Learned Text Embeddings and CLIP Text Embeddings,But I can find out where it's using it.

if not self.multi_scale:
            pred_ml_scores = self.logit_scale * self.text_embedding(text_features)
        else:
            pred_ml_scores = self.logit_scale * self.get_multi_level_scores(text_features)
        
        mlr_loss = self.get_rank_loss(pred_ml_scores, batched_inputs)

There doesn't seem to be a calculation going on here.

@frh23333
Copy link
Collaborator

frh23333 commented Jan 5, 2024

Hello, both text_features and text_embedding have been normalized before, so the dot product of two vectors is equal to cos_sim.

@Key-lei
Copy link
Author

Key-lei commented Jan 9, 2024

Thank you for your answer, very interesting work!🎉🎉🎉

@Key-lei
Copy link
Author

Key-lei commented Jan 13, 2024

I'm sorry to bother you again, but I still can't understand the cosine similarity calculation. logit_scale is a floating point number,

if not self.multi_scale:
            pred_ml_scores = self.logit_scale * self.text_embedding(text_features)
else:
            pred_ml_scores = self.logit_scale * self.get_multi_level_scores(text_features)
        
    mlr_loss = self.get_rank_loss(pred_ml_scores, batched_inputs)

text_embedding is from the img_features, text_features = self.extract_global_feature(features)
Here there is only a linear layer mapping.
The expression of the formula in the paper is as follows
image
, but I can't find clip_text_embedding. Can you help me find where in the code to use clip_text_embedding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants