You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the model(**inputs, output_attentions=True), the output attention tensors has size (12, batch_size, 12, 577, 577), it looks like the self attention for image patches, what does the first "12" represents here? Since the Blip spaces on huggingface say that it is the attention matrix for all layers.
The text was updated successfully, but these errors were encountered:
For the model(**inputs, output_attentions=True), the output attention tensors has size (12, batch_size, 12, 577, 577), it looks like the self attention for image patches, what does the first "12" represents here? Since the Blip spaces on huggingface say that it is the attention matrix for all layers.
The text was updated successfully, but these errors were encountered: