Skip to content

components batch_score_llm

github-actions[bot] edited this page Dec 12, 2023 · 13 revisions

Batch Score Large Language Models

batch_score_llm

Overview

Version: 0.0.1

View in Studio: https://ml.azure.com/registries/azureml/components/batch_score_llm/version/0.0.1

Inputs

Predefined arguments for parallel job: https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-parallel?source=recommendations#predefined-arguments-for-parallel-job

Name Description Type Default Optional Enum
resume_from The pipeline run id to resume from string True
append_row_safe_output Enable PRS safe append row configuration that is needed when dealing with large outputs with Unicode characters. boolean True

Custom arguments

Name Description Type Default Optional Enum
data_input_table The data to be split and scored in parallel. mltable False
api_type Specifies the API type used for scoring. string completion False ['completion', 'chat_completion', 'embeddings', 'vesta', 'vesta_chat_completion']
scoring_url Url used for scoring input data. string False
authentication_type Specifies the authentication type to use for scoring. string managed_identity True ['azureml_workspace_connection', 'managed_identity']
connection_name Specifies the connection name containing the api-key for scoring. This is required for authentication type "azureml_workspace_connection". string True
debug_mode boolean False
additional_properties A stringified json expressing additional properties to be added to each request body at the top level. string True
additional_headers A stringified json expressing additional headers to be added to each request. string True
configuration_file A json file containing configuration values for the batch score component. uri_file True
tally_failed_requests Determines if failed requests will be outputted. Enabling this will count failed requests towards error_threshold. boolean False
tally_exclusions Configures which failed requests will be excluded from tallying. Only applicable when tally_failed_requests is enabled. Delimit with " " when specifying multiple values. - "none": None of the failed requests will be excluded from tallying. - "bad_request_to_model": 400 model status code will be excluded from tallying. string none
segment_large_requests string True ['disabled', 'enabled']
segment_max_token_size integer 600
app_insights_connection_string An application insights connection string. If provided, batch component will emit metrics and logs to this application insight instance string True
ensure_ascii If ensure_ascii is True, the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is False, these characters will be output as-is. More defailted information can be found at https://docs.python.org/3/library/json.html boolean False
output_behavior string append_row False ['append_row', 'summary_only']
max_retry_time_interval The maximum time (in seconds) spent retrying a payload. If unspecified, payloads are retried unlimited times. integer True

Parallel configuration

Name Description Type Default Optional Enum
initial_worker_count integer 5
max_worker_count Overrides initial_worker_count if necessary integer 200

Partial results configuration

Name Description Type Default Optional Enum
save_mini_batch_results string disabled False ['disabled', 'enabled']
async_mode Whether to use PRS mini-batch streaming feature, which allows each PRS processor to process multiple mini-batches at a time. boolean False

Outputs

Name Description Type
job_out_path uri_file
mini_batch_results_out_directory uri_folder
metrics_out_directory uri_folder
Clone this wiki locally