Where Use the cos_sim between the Learned Text Embeddings and CLIP Text Embeddings? #2

Key-lei · 2024-01-03T07:00:19Z

Thanks for this interesting work.

This paper uses cos_sim to compute the simliarity between Learned Text Embeddings and CLIP Text Embeddings,But I can find out where it's using it.

if not self.multi_scale:
            pred_ml_scores = self.logit_scale * self.text_embedding(text_features)
        else:
            pred_ml_scores = self.logit_scale * self.get_multi_level_scores(text_features)
        
        mlr_loss = self.get_rank_loss(pred_ml_scores, batched_inputs)

There doesn't seem to be a calculation going on here.

The text was updated successfully, but these errors were encountered:

frh23333 · 2024-01-05T11:01:01Z

Hello, both text_features and text_embedding have been normalized before, so the dot product of two vectors is equal to cos_sim.

Key-lei · 2024-01-09T02:38:10Z

Thank you for your answer, very interesting work!🎉🎉🎉

Key-lei · 2024-01-13T14:02:42Z

I'm sorry to bother you again, but I still can't understand the cosine similarity calculation. logit_scale is a floating point number,

if not self.multi_scale:
            pred_ml_scores = self.logit_scale * self.text_embedding(text_features)
else:
            pred_ml_scores = self.logit_scale * self.get_multi_level_scores(text_features)
        
    mlr_loss = self.get_rank_loss(pred_ml_scores, batched_inputs)

text_embedding is from the img_features, text_features = self.extract_global_feature(features)
Here there is only a linear layer mapping.
The expression of the formula in the paper is as follows

, but I can't find clip_text_embedding. Can you help me find where in the code to use clip_text_embedding?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where Use the cos_sim between the Learned Text Embeddings and CLIP Text Embeddings? #2

Where Use the cos_sim between the Learned Text Embeddings and CLIP Text Embeddings? #2

Key-lei commented Jan 3, 2024

frh23333 commented Jan 5, 2024

Key-lei commented Jan 9, 2024

Key-lei commented Jan 13, 2024

Where Use the cos_sim between the Learned Text Embeddings and CLIP Text Embeddings? #2

Where Use the cos_sim between the Learned Text Embeddings and CLIP Text Embeddings? #2

Comments

Key-lei commented Jan 3, 2024

frh23333 commented Jan 5, 2024

Key-lei commented Jan 9, 2024

Key-lei commented Jan 13, 2024