-
Notifications
You must be signed in to change notification settings - Fork 130
components text_generation_finetune
github-actions[bot] edited this page May 4, 2024
·
50 revisions
Component to finetune model for Text Generation task
Version: 0.0.44
View in Studio: https://ml.azure.com/registries/azureml/components/text_generation_finetune/version/0.0.44
Lora parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_lora | lora enabled | string | false | True | ['true', 'false'] |
merge_lora_weights | if set to true, the lora trained weights will be merged to base model before saving | string | true | True | ['true', 'false'] |
lora_alpha | lora attention alpha | integer | 128 | True | |
lora_r | lora dimension | integer | 8 | True | |
lora_dropout | lora dropout value | number | 0.0 | True |
Training parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
num_train_epochs | training epochs | integer | 1 | True | |
max_steps | If set to a positive number, the total number of training steps to perform. Overrides 'epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted. | integer | -1 | True | |
per_device_train_batch_size | Train batch size | integer | 1 | True | |
per_device_eval_batch_size | Validation batch size | integer | 1 | True | |
auto_find_batch_size | Flag to enable auto finding of batch size. If the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) enabling auto_find_batch_size will find the correct batch size by iteratively reducing 'per_device_train_batch_size' by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
learning_rate | Start learning rate. Defaults to linear scheduler. | number | 2e-05 | True | |
warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate | integer | 0 | True | |
weight_decay | The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | |
adam_beta1 | The beta1 hyperparameter for the AdamW optimizer | number | 0.9 | True | |
adam_beta2 | The beta2 hyperparameter for the AdamW optimizer | number | 0.999 | True | |
adam_epsilon | The epsilon hyperparameter for the AdamW optimizer | number | 1e-08 | True | |
gradient_accumulation_steps | Number of updates steps to accumulate the gradients for, before performing a backward/update pass | integer | 1 | True | |
eval_accumulation_steps | Number of predictions steps to accumulate before moving the tensors to the CPU, will be passed as None if set to -1 | integer | -1 | True | |
lr_scheduler_type | learning rate scheduler to use. | string | linear | True | ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup'] |
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 32 | True | ['32', '16'] |
seed | Random seed that will be set at the beginning of training | integer | 42 | True | |
enable_full_determinism | Ensure reproducible behavior during distributed training | string | false | True | ['true', 'false'] |
dataloader_num_workers | Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. | integer | 0 | True | |
ignore_mismatched_sizes | Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model | string | true | True | ['true', 'false'] |
max_grad_norm | Maximum gradient norm (for gradient clipping) | number | 1.0 | True | |
evaluation_strategy | The evaluation strategy to adopt during training | string | epoch | True | ['epoch', 'steps'] |
evaluation_steps_interval | The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. | number | 0.0 | True | |
eval_steps | Number of update steps between two evals if evaluation_strategy='steps' | integer | 500 | True | |
logging_strategy | The logging strategy to adopt during training. | string | steps | True | ['epoch', 'steps'] |
logging_steps | Number of update steps between two logs if logging_strategy='steps' | integer | 10 | True | |
metric_for_best_model | Specify the metric to use to compare two different models | string | loss | True | ['loss'] |
resume_from_checkpoint | Loads Optimizer, Scheduler and Trainer state for finetuning if true | string | false | True | ['true', 'false'] |
save_total_limit | If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. If the value is -1 saves all checkpoints" | integer | -1 | True |
Early Stopping Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_early_stopping | Enable early stopping | string | false | True | ['true', 'false'] |
early_stopping_patience | Stop training when the specified metric worsens for early_stopping_patience evaluation calls | integer | 1 | True | |
early_stopping_threshold | Denotes how much the specified metric must improve to satisfy early stopping conditions | number | 0.0 | True |
Deepspeed Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_deepspeed | If set to true, will enable deepspeed for training | string | false | True | ['true', 'false'] |
deepspeed | Deepspeed config to be used for finetuning | uri_file | True | ||
deepspeed_stage | This parameter configures which DEFAULT deepspeed config to be used - stage2 or stage3. The default choice is stage2. Note that, this parameter is ONLY applicable when user doesn't pass any config information via deepspeed port. | string | 2 | True | ['2', '3'] |
ORT Parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_ort | If set to true, will use the ONNXRunTime training | string | false | True | ['true', 'false'] |
Dataset parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
preprocess_output | output folder of preprocessor containing encoded train.jsonl valid.jsonl and the model pretrained info | uri_folder | False | ||
model_selector_output | output folder of model selector containing model metadata like config, checkpoints, tokenizer config | uri_folder | False |
Validation parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
system_properties | Validation parameters propagated from pipeline. | string | True |
Name | Description | Type |
---|---|---|
pytorch_model_folder | Output dir to save the finetune model and other metadata | uri_folder |
azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/51