Skip to content

models google vit base patch16 224

github-actions[bot] edited this page Aug 29, 2023 · 25 revisions

google-vit-base-patch16-224

Overview

Description: The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. The model assumes that the image is presented as a sequence of fixed-size patches, which are linearly embedded, and that a [CLS] token is added to the beginning of the sequence for classification tasks. The model also requires absolute position embeddings to be added before feeding the sequence to the layers of the Transformer encoder. By pretraining the model in this way, it is possible to create an inner representation of images that can be used to extract features and classifiers useful for downstream tasks. For instance, if a dataset of labeled images is available, a linear layer can be placed on top of the pre-trained encoder, with the last hidden state of the [CLS] token serving as a representation of the entire image. > The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. ### Inference samples Inference type|Python sample (Notebook)|CLI with YAML |--|--|--| Real time|image-classification-online-endpoint.ipynb|image-classification-online-endpoint.sh Batch |image-classification-batch-endpoint.ipynb|image-classification-batch-endpoint.sh ### Finetuning samples Task|Use case|Dataset|Python sample (Notebook)|CLI with YAML |---|--|--|--|--| Image Multi-class classification|Image Multi-class classification|fridgeObjects|fridgeobjects-multiclass-classification.ipynb|fridgeobjects-multiclass-classification.sh Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|fridgeobjects-multilabel-classification.ipynb|fridgeobjects-multilabel-classification.sh ### Model Evaluation |Task|Use case|Dataset|Python sample (Notebook)| |---|--|--|--| |Image Multi-class classification|Image Multi-class classification|fridgeObjects|image-multiclass-classification.ipynb| |Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|image-multilabel-classification.ipynb| ### Sample inputs and outputs (for real-time inference) #### Sample input json { "input_data": { "columns": [ "image" ], "index": [0, 1], "data": ["image1", "image2"] } } Note: "image1" and "image2" string should be in base64 format or publicly accessible urls. #### Sample output json [ { "probs": [0.91, 0.09], "labels": ["can", "carton"] }, { "probs": [0.1, 0.9], "labels": ["can", "carton"] } ] #### Model inference - visualization for a sample image mc visualization

Version: 6

Tags

Preview license : apache-2.0 model_specific_defaults : ordereddict([('apply_deepspeed', 'false'), ('apply_ort', 'false')]) task : image-classification

View in Studio: https://ml.azure.com/registries/azureml/models/google-vit-base-patch16-224/version/6

License: apache-2.0

Properties

SHA: 2ddc9d4e473d7ba52128f0df4723e478fa14fb80

datasets: imagenet-1k, imagenet-21k

evaluation-min-sku-spec: 4|1|28|176

evaluation-recommended-sku: Standard_NC6s_v3

finetune-min-sku-spec: 4|1|28|176

finetune-recommended-sku: Standard_NC6s_v3

finetuning-tasks: image-classification

inference-min-sku-spec: 2|0|14|28

inference-recommended-sku: Standard_DS3_v2

model_id: google/vit-base-patch16-224

Clone this wiki locally