-
Notifications
You must be signed in to change notification settings - Fork 130
components oss_text_generation_finetune
github-actions[bot] edited this page Nov 10, 2023
·
25 revisions
FTaaS component to finetune model for Text Generation task
Version: 0.0.2
View in Studio: https://ml.azure.com/registries/azureml/components/oss_text_generation_finetune/version/0.0.2
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
mlflow_model_path | Input folder path containing mlflow model for further finetuning. Proper model/huggingface id must be passed. | mlflow_model | True | ||
dataset_input | Output of data import component. The folder contains train and validation data. | uri_folder | False | ||
text_key | key for text in an example. format your data keeping in mind that text is concatenated with ground_truth while finetuning in the form - text + groundtruth. for eg. "text"="knock knock\n", "ground_truth"="who's there"; will be treated as "knock knock\nwho's there" | string | False | ||
ground_truth_key | key for ground_truth in an example. we take separate column for ground_truth to enable use cases like summarization, translation, question_answering, etc. which can be repurposed in form of text-generation where both text and ground_truth are needed. This separation is useful for calculating metrics. for eg. "text"="Summarize this dialog:\n{input_dialogue}\nSummary:\n", "ground_truth"="{summary of the dialogue}" | string | True | ||
batch_size | Number of examples to batch before calling the tokenization function | integer | 1000 | True | |
pad_to_max_length | If set to True, the returned sequences will be padded according to the model's padding side and padding index, up to their max_seq_length . If no max_seq_length is specified, the padding is done up to the model's max length. |
string | false | True | ['true', 'false'] |
max_seq_length | Default is -1 which means the padding is done up to the model's max length. Else will be padded to max_seq_length . |
integer | -1 | True | |
number_of_gpu_to_use_finetuning | number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune | integer | 1 | False | |
apply_lora | lora enabled | string | false | True | ['true', 'false'] |
merge_lora_weights | if set to true, the lora trained weights will be merged to base model before saving | string | true | True | ['true', 'false'] |
lora_alpha | lora attention alpha | integer | 128 | True | |
lora_r | lora dimension | integer | 8 | True | |
lora_dropout | lora dropout value | number | 0.0 | True | |
num_train_epochs | training epochs | integer | 1 | True | |
max_steps | If set to a positive number, the total number of training steps to perform. Overrides 'epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted. | integer | -1 | True | |
per_device_train_batch_size | Train batch size | integer | 1 | True | |
per_device_eval_batch_size | Validation batch size | integer | 1 | True | |
auto_find_batch_size | Flag to enable auto finding of batch size. If the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) enabling auto_find_batch_size will find the correct batch size by iteratively reducing 'per_device_train_batch_size' by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
learning_rate | Start learning rate. Defaults to linear scheduler. | number | 2e-05 | True | |
warmup_steps | Number of steps used for a linear warmup from 0 to learning_rate | integer | 0 | True | |
weight_decay | The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | |
adam_beta1 | The beta1 hyperparameter for the AdamW optimizer | number | 0.9 | True | |
adam_beta2 | The beta2 hyperparameter for the AdamW optimizer | number | 0.999 | True | |
adam_epsilon | The epsilon hyperparameter for the AdamW optimizer | number | 1e-08 | True | |
gradient_accumulation_steps | Number of updates steps to accumulate the gradients for, before performing a backward/update pass | integer | 1 | True | |
eval_accumulation_steps | Number of predictions steps to accumulate before moving the tensors to the CPU, will be passed as None if set to -1 | integer | -1 | True | |
lr_scheduler_type | learning rate scheduler to use. | string | linear | True | ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup'] |
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 32 | True | ['32', '16'] |
seed | Random seed that will be set at the beginning of training | integer | 42 | True | |
enable_full_determinism | Ensure reproducible behavior during distributed training | string | false | True | ['true', 'false'] |
dataloader_num_workers | Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. | integer | 0 | True | |
ignore_mismatched_sizes | Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model | string | true | True | ['true', 'false'] |
max_grad_norm | Maximum gradient norm (for gradient clipping) | number | 1.0 | True | |
evaluation_strategy | The evaluation strategy to adopt during training | string | epoch | True | ['epoch', 'steps'] |
evaluation_steps_interval | The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0. | number | 0.0 | True | |
eval_steps | Number of update steps between two evals if evaluation_strategy='steps' | integer | 500 | True | |
logging_strategy | The logging strategy to adopt during training. | string | epoch | True | ['epoch', 'steps'] |
logging_steps | Number of update steps between two logs if logging_strategy='steps' | integer | 500 | True | |
metric_for_best_model | Specify the metric to use to compare two different models | string | loss | True | ['loss'] |
resume_from_checkpoint | Loads Optimizer, Scheduler and Trainer state for finetuning if true | string | false | True | ['true', 'false'] |
save_total_limit | If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir. If the value is -1 saves all checkpoints" | integer | 1 | True | |
apply_early_stopping | Enable early stopping | string | false | True | ['true', 'false'] |
early_stopping_patience | Stop training when the specified metric worsens for early_stopping_patience evaluation calls | integer | 1 | True | |
early_stopping_threshold | Denotes how much the specified metric must improve to satisfy early stopping conditions | number | 0.0 | True | |
apply_deepspeed | If set to true, will enable deepspeed for training | string | false | True | ['true', 'false'] |
deepspeed_stage | This parameter configures which DEFAULT deepspeed config to be used - stage2 or stage3. The default choice is stage2. Note that, this parameter is ONLY applicable when user doesn't pass any config information via deepspeed port. | string | 2 | True | ['2', '3'] |
apply_ort | If set to true, will use the ONNXRunTime training | string | false | True | ['true', 'false'] |
system_properties | Validation parameters propagated from pipeline. | string | True | ||
registered_model_name | Name of the registered model | string | True |
Name | Description | Type |
---|---|---|
output_model | Output dir to save the finetuned lora weights | uri_folder |
azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/28