Skip to content

models Salesforce BLIP vqa base

github-actions[bot] edited this page Oct 28, 2023 · 12 revisions

Salesforce-BLIP-vqa-base

Overview

BLIP is a new Vision-Language Pre-training (VLP) framework that excels in both understanding-based and generation-based tasks. It effectively utilizes noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. BLIP achieves state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval, image captioning, and VQA. It also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are available on the official repository.

The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Inference samples

Inference type Python sample (Notebook) CLI with YAML
Real time visual-question-answering-online-endpoint.ipynb visual-question-answering-online-endpoint.sh
Batch visual-question-answering-batch-endpoint.ipynb visual-question-answering-batch-endpoint.sh

Sample inputs and outputs (for real-time inference)

Sample input

{
   "input_data":{
      "columns":[
         "image",
         "text"
      ],
      "index":[0, 1],
      "data":[
         ["image1", "What is in the picture?"],
         ["image2", "How many dogs are in the picture?"]
      ]
   }
}

Note:

  • "image1" and "image2" should be publicly accessible urls or strings in base64 format.

Sample output

[
   {
      "text": "sand"
   },
   {
      "text": "1"
   }
]

Model inference: Text for the sample image

Salesforce-BLIP-vqa-base

Version: 1

Tags

Preview license : mit task : visual-question-answering

View in Studio: https://ml.azure.com/registries/azureml/models/Salesforce-BLIP-vqa-base/version/1

License: mit

Properties

SHA: 99909119248dc49e49cd698ad685b3b646595a38

inference-min-sku-spec: 2|0|7|14

inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

model_id: Salesforce/blip-vqa-base

Clone this wiki locally