models roberta base

roberta-base

Overview

RoBERTa is a transformer-based language model that was fine-tuned from RoBERTa large model on Multi-Genre Natural Language Inference (MNLI) corpus for English. It can be used for zero-shot classification tasks and can be accessed from GitHub Repo. It is important to note that the model was trained on unfiltered data, so generated results may have disturbing and offensive stereotypes. Also, it should not be used to create hostile or alienating environments or to present factual or true representation of people or events. The model is intended to be used on tasks that use the whole sentence or sequence classification, token classification or question answering. While it is pre-trained on a large corpus of English data, it's only intended to be fine-tuned further on specific tasks. The data the model was trained on includes 160GB of English text from various sources including Wikipedia and news articles. Pretraining was done using V100 GPUs for 500,000 steps using the masked language modeling objective. The training procedure has also been mentioned that it was done using Adam optimizer with a batch size of 8,000 and a sequence length of 512. Additionally, the model is a case-sensitive model, the tokenization and masking is done for the model pre-processing. A pipeline for masked language modeling can be used to directly use the model, but it is highly recommended to use the fine-tuned models available on the model hub for the task that interests you. The bias from the training data will also affect all fine-tuned versions of this model.
Please Note: This model accepts masks in <mask> format. See Sample input for reference.

The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Inference samples

Inference type	Python sample (Notebook)	CLI with YAML
Real time	fill-mask-online-endpoint.ipynb	fill-mask-online-endpoint.sh
Batch	fill-mask-batch-endpoint.ipynb	coming soon

Model Evaluation

Task	Use case	Dataset Python sample (Notebook)	CLI with YAML
Fill Mask	Fill Mask	rcds/wikipedia-for-mask-filling	evaluate-model-fill-mask.ipynb

Finetuning samples

Task	Use case	Dataset	Python sample (Notebook)	CLI with YAML
Text Classification	Emotion Detection	Emotion	emotion-detection.ipynb	emotion-detection.sh
Token Classification	Named Entity Recognition	Conll2003	named-entity-recognition.ipynb	named-entity-recognition.sh
Question Answering	Extractive Q&A	SQUAD (Wikipedia)	extractive-qa.ipynb	extractive-qa.sh

Sample inputs and outputs (for real-time inference)

Sample input

{
    "input_data": {
        "input_string": ["Paris is the <mask> of France.", "Today is a <mask> day!"]
    }
}

Sample output

[
    {
        "0": "capital"
    },
    {
        "0": "beautiful"
    }
]

Version: 11

Tags

Preview computes_allow_list : ['Standard_NV12s_v3', 'Standard_NV24s_v3', 'Standard_NV48s_v3', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24rs_v3', 'Standard_NC6s_v2', 'Standard_NC12s_v2', 'Standard_NC24s_v2', 'Standard_NC24rs_v2', 'Standard_NC4as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_ND6s', 'Standard_ND12s', 'Standard_ND24s', 'Standard_ND24rs', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4'] license : mit model_specific_defaults : ordereddict([('apply_deepspeed', 'true'), ('apply_lora', 'true'), ('apply_ort', 'true')]) task : fill-mask

View in Studio: https://ml.azure.com/registries/azureml/models/roberta-base/version/11

License: mit

Properties

SHA: bc2764f8af2e92b6eb5679868df33e224075ca68

datasets: bookcorpus, wikipedia

evaluation-min-sku-spec: 8|0|28|56

evaluation-recommended-sku: Standard_DS4_v2

finetune-min-sku-spec: 4|1|28|176

finetune-recommended-sku: Standard_NC24rs_v3

finetuning-tasks: text-classification, token-classification, question-answering

inference-min-sku-spec: 2|0|7|14

inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

languages: en

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly