Skip to content

models roberta large

github-actions[bot] edited this page Oct 21, 2023 · 26 revisions

roberta-large

Overview

The RoBERTa Large model is a pretrained language model developed by the Hugging Face team, based on the transformer architecture. It was trained on a large corpus of English data in a self-supervised manner using the masked language modeling (MLM) objective. The model is case-sensitive and primarily intended for use in fine-tuning downstream tasks such as sequence classification, token classification, or question answering. It was trained on a combination of five datasets weighing 160GB of text, and uses a vocabulary size of 50,000 for tokenization. The model was trained for 500K steps on 1024 V100 GPUs with a batch size of 8K and a sequence length of 512. The optimizer used was Adam with a learning rate of 4e-4, β1=0.9, β2=0.98, and ϵ=1e-6, with a weight decay of 0.01 and learning rate warmup for 30,000 steps.
Please Note: This model accepts masks in <mask> format. See Sample input for reference. 

The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Inference samples

Inference type Python sample (Notebook) CLI with YAML
Real time fill-mask-online-endpoint.ipynb fill-mask-online-endpoint.sh
Batch fill-mask-batch-endpoint.ipynb coming soon

Finetuning samples

Task Use case Dataset Python sample (Notebook) CLI with YAML
Text Classification Emotion Detection Emotion emotion-detection.ipynb emotion-detection.sh
Token Classification Named Entity Recognition Conll2003 named-entity-recognition.ipynb named-entity-recognition.sh
Question Answering Extractive Q&A SQUAD (Wikipedia) extractive-qa.ipynb extractive-qa.sh

Model Evaluation

Task Use case Python sample (Notebook) CLI with YAML
Fill Mask Fill Mask rcds/wikipedia-for-mask-filling evaluate-model-fill-mask.ipynb

Sample inputs and outputs (for real-time inference)

Sample input

{
    "input_data": {
        "input_string": ["Paris is the <mask> of France.", "Today is a <mask> day!"]
    }
}

Sample output

[
    {
        "0": "capital"
    },
    {
        "0": "beautiful"
    }
]

Version: 12

Tags

Preview computes_allow_list : ['Standard_NV12s_v3', 'Standard_NV24s_v3', 'Standard_NV48s_v3', 'Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24rs_v3', 'Standard_NC6s_v2', 'Standard_NC12s_v2', 'Standard_NC24s_v2', 'Standard_NC24rs_v2', 'Standard_NC4as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_ND6s', 'Standard_ND12s', 'Standard_ND24s', 'Standard_ND24rs', 'Standard_ND40rs_v2', 'Standard_ND96asr_v4'] license : mit model_specific_defaults : ordereddict([('apply_deepspeed', 'true'), ('apply_lora', 'true'), ('apply_ort', 'true')]) task : fill-mask

View in Studio: https://ml.azure.com/registries/azureml/models/roberta-large/version/12

License: mit

Properties

SHA: 716877d372b884cad6d419d828bac6c85b3b18d9

datasets: bookcorpus, wikipedia

evaluation-min-sku-spec: 8|0|28|56

evaluation-recommended-sku: Standard_DS4_v2

finetune-min-sku-spec: 4|1|28|176

finetune-recommended-sku: Standard_NC24rs_v3

finetuning-tasks: text-classification, token-classification, question-answering

inference-min-sku-spec: 4|0|14|28

inference-recommended-sku: Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

languages: en

Clone this wiki locally