models Salesforce BLIP vqa base

Salesforce-BLIP-vqa-base

Overview

BLIP is a new Vision-Language Pre-training (VLP) framework that excels in both understanding-based and generation-based tasks. It effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. BLIP achieves state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval, image captioning, and VQA. It also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are available on the official repository.

The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Inference samples

Inference type	Python sample (Notebook)	CLI with YAML
Real time	visual-question-answering-online-endpoint.ipynb	visual-question-answering-online-endpoint.sh
Batch	visual-question-answering-batch-endpoint.ipynb	visual-question-answering-batch-endpoint.sh

Sample inputs and outputs (for real-time inference)

Sample input

{
   "input_data":{
      "columns":[
         "image",
         "text"
      ],
      "index":[0, 1],
      "data":[
         ["image1", "What is in the picture?"],
         ["image2", "How many dogs are in the picture?"]
      ]
   }
}

Note:

"image1" and "image2" should be publicly accessible urls or strings in base64 format.

Sample output

[
   {
      "text": "sand"
   },
   {
      "text": "1"
   }
]

Model inference - visual question answering

For sample image below and text prompt "What is in the picture?", the output text is "sand".

Version: 2

Tags

Preview license : mit task : visual-question-answering

View in Studio: https://ml.azure.com/registries/azureml/models/Salesforce-BLIP-vqa-base/version/2

License: mit

Properties

SHA: 99909119248dc49e49cd698ad685b3b646595a38

inference-min-sku-spec: 2|0|7|14

inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

model_id: Salesforce/blip-vqa-base

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly