models google vit base patch16 224

google-vit-base-patch16-224

Overview

Description: The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. The model assumes that the image is presented as a sequence of fixed-size patches, which are linearly embedded, and that a [CLS] token is added to the beginning of the sequence for classification tasks. The model also requires absolute position embeddings to be added before feeding the sequence to the layers of the Transformer encoder. By pretraining the model in this way, it is possible to create an inner representation of images that can be used to extract features and classifiers useful for downstream tasks. For instance, if a dataset of labeled images is available, a linear layer can be placed on top of the pre-trained encoder, with the last hidden state of the [CLS] token serving as a representation of the entire image. > The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model. ### Inference samples Inference type|Python sample (Notebook)|CLI with YAML |--|--|--| Real time|image-classification-online-endpoint.ipynb|image-classification-online-endpoint.sh Batch |image-classification-batch-endpoint.ipynb|image-classification-batch-endpoint.sh ### Finetuning samples Task|Use case|Dataset|Python sample (Notebook)|CLI with YAML |---|--|--|--|--| Image Multi-class classification|Image Multi-class classification|fridgeObjects|fridgeobjects-multiclass-classification.ipynb|fridgeobjects-multiclass-classification.sh Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|fridgeobjects-multilabel-classification.ipynb|fridgeobjects-multilabel-classification.sh ### Model Evaluation |Task|Use case|Dataset|Python sample (Notebook)| |---|--|--|--| |Image Multi-class classification|Image Multi-class classification|fridgeObjects|image-multiclass-classification.ipynb| |Image Multi-label classification|Image Multi-label classification|multilabel fridgeObjects|image-multilabel-classification.ipynb| ### Sample inputs and outputs (for real-time inference) #### Sample input json { "input_data": { "columns": [ "image" ], "index": [0, 1], "data": ["image1", "image2"] } } Note: "image1" and "image2" string should be in base64 format or publicly accessible urls. #### Sample output json [ { "probs": [0.91, 0.09], "labels": ["can", "carton"] }, { "probs": [0.1, 0.9], "labels": ["can", "carton"] } ] #### Model inference - visualization for a sample image mc visualization

Version: 6

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models google vit base patch16 224

google-vit-base-patch16-224

Overview

Tags

Properties

Wiki menu

Clone this wiki locally