What does the model(**inputs, output_attentions=True) output? #219

ZhanliangAaronWang · 2024-10-02T07:19:24Z

For the model(**inputs, output_attentions=True), the output attention tensors has size (12, batch_size, 12, 577, 577), it looks like the self attention for image patches, what does the first "12" represents here? Since the Blip spaces on huggingface say that it is the attention matrix for all layers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does the model(**inputs, output_attentions=True) output? #219

What does the model(**inputs, output_attentions=True) output? #219

ZhanliangAaronWang commented Oct 2, 2024

What does the model(**inputs, output_attentions=True) output? #219

What does the model(**inputs, output_attentions=True) output? #219

Comments

ZhanliangAaronWang commented Oct 2, 2024