-
Notifications
You must be signed in to change notification settings - Fork 130
models OpenAI CLIP Image Text Embeddings vit base patch32
Description: The CLIP
model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image encoder and uses a masked self-attention Transformer as a text encoder. The model was trained on publicly available image-caption data, which was gathered in a mostly non-interventionist manner. The model is intended as a research output for research communities, and the primary intended users of these models are AI researchers. The model has been evaluated on a wide range of benchmarks across a variety of computer vision datasets, but it currently struggles with respect to certain tasks such as fine-grained classification and counting objects. The model also poses issues with regards to fairness and bias, and the specific biases it exhibits can depend significantly on class design and the choices one makes for categories to include and exclude. > The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. ### Inference samples Inference type|Python sample (Notebook)|CLI with YAML |--|--|--| Real time|image-text-embeddings-online-endpoint.ipynb|image-text-embeddings-online-endpoint.sh Batch|image-text-embeddings-batch-endpoint.ipynb|image-text-embeddings-batch-endpoint.sh ### Sample inputs and outputs (for real-time inference) #### Sample input for image embeddings json { "input_data":{ "columns":[ "image", "text" ], "index":[0, 1], "data":[ ["image1", ""], ["image2", ""] ] } }
Note: "image1" and "image2" should be publicly accessible urls or strings in base64
format #### Sample output json [ { "image_features": [-0.92, -0.13, 0.02, ... , 0.13], }, { "image_features": [0.54, -0.83, 0.13, ... , 0.26], } ]
Note: returned embeddings have dimension 512 and are not normalized #### Sample input for text embeddings json { "input_data":{ "columns":[ "image", "text" ], "index":[0, 1], "data":[ ["", "sample text 1"], ["", "sample text 2"] ] } }
#### Sample output json [ { "text_features": [0.42, -0.13, -0.92, ... , 0.63], }, { "text_features": [-0.14, 0.93, -0.15, ... , 0.66], } ]
Note: returned embeddings have dimension 512 and are not normalized #### Sample input for image and text embeddings json { "input_data":{ "columns":[ "image", "text" ], "index":[0, 1], "data":[ ["image1", "sample text 1"], ["image2", "sample text 2"] ] } }
Note: "image1" and "image2" should be publicly accessible urls or strings in base64
format #### Sample output json [ { "image_features": [0.92, -0.13, 0.02, ... , -0.13], "text_features": [0.42, 0.13, -0.92, ... , -0.63] }, { "image_features": [-0.54, -0.83, 0.13, ... , -0.26], "text_features": [-0.14, -0.93, 0.15, ... , 0.66] } ]
Note: returned embeddings have dimension 512 and are not normalized
Version: 2
Preview
license : mit
task : embeddings
View in Studio: https://ml.azure.com/registries/azureml/models/OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32/version/2
License: mit
inference-min-sku-spec: 2|0|7|14
inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
model_id: openai/clip-vit-base-patch32