-
Notifications
You must be signed in to change notification settings - Fork 128
models google vit base patch16 224
Description: The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. The model assumes that the image is presented as a sequence of fixed-size patches, which are linearly embedded, and that a [CLS] token is added to the beginning of the sequence for classification tasks. The model also requires absolute position embeddings to be added before feeding the sequence to the layers of the Transformer encoder. By pretraining the model in this way, it is possible to create an inner representation of images that can be used to extract features and classifiers useful for downstream tasks. For instance, if a dataset of labeled images is available, a linear layer can be placed on top of the pre-trained encoder, with the last hidden state of the [CLS] token serving as a representation of the entire image. > The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. ### Inference samples Inference type|Python sample (Notebook)|CLI with YAML |--|--|--| Real time|image-classification-online-endpoint.ipynb|image-classification-online-endpoint.sh Batch |image-classification-batch-endpoint.ipynb|image-classification-batch-endpoint.sh ### Finetuning samples Task|Use case|Dataset|Python sample (Notebook)|CLI with YAML |---|--|--|--|--| Image Multi-class classification|Image Multi-class classification|fridgeObjects|fridgeobjects-multiclass-classification.ipynb|fridgeobjects-multiclass-classification.sh Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|fridgeobjects-multilabel-classification.ipynb|fridgeobjects-multilabel-classification.sh ### Model Evaluation |Task|Use case|Dataset|Python sample (Notebook)| |---|--|--|--| |Image Multi-class classification|Image Multi-class classification|fridgeObjects|image-multiclass-classification.ipynb| |Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|image-multilabel-classification.ipynb| ### Sample inputs and outputs (for real-time inference) #### Sample input json { "input_data": { "columns": [ "image" ], "index": [0, 1], "data": ["image1", "image2"] } } Note: "image1" and "image2" string should be in base64 format or publicly accessible urls.
#### Sample output json [ { "probs": [0.91, 0.09], "labels": ["can", "carton"] }, { "probs": [0.1, 0.9], "labels": ["can", "carton"] } ]
#### Model inference - visualization for a sample image
Version: 6
Preview
license : apache-2.0
model_specific_defaults : ordereddict([('apply_deepspeed', 'false'), ('apply_ort', 'false')])
task : image-classification
View in Studio: https://ml.azure.com/registries/azureml/models/google-vit-base-patch16-224/version/6
License: apache-2.0
SHA: 2ddc9d4e473d7ba52128f0df4723e478fa14fb80
datasets: imagenet-1k, imagenet-21k
evaluation-min-sku-spec: 4|1|28|176
evaluation-recommended-sku: Standard_NC6s_v3
finetune-min-sku-spec: 4|1|28|176
finetune-recommended-sku: Standard_NC6s_v3
finetuning-tasks: image-classification
inference-min-sku-spec: 2|0|14|28
inference-recommended-sku: Standard_DS3_v2
model_id: google/vit-base-patch16-224