-
Notifications
You must be signed in to change notification settings - Fork 130
components batch_benchmark_inference_with_inference_compute
github-actions[bot] edited this page Apr 16, 2024
·
7 revisions
Components for batch endpoint inference with inference compute support.
Version: 0.0.2
View in Studio: https://ml.azure.com/registries/azureml/components/batch_benchmark_inference_with_inference_compute/version/0.0.2
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
input_dataset | Input jsonl dataset that contains prompt. For the performance test, this one will be neglected. | uri_folder | True | ||
model_type | Type of model's input and output contract. Can be one of ('oai', 'oss', 'vision_oss') | string | False | ['oai', 'oss', 'vision_oss'] | |
inference_compute | Compute to be used for inferencing. | string | False | ||
batch_input_pattern | The string for the batch input pattern. The input should be the payload format with substitution for the key for the value put in the ###<key> . For example, one can use the following format for a llama text-gen model with a input dataset has prompt for the payload and _batch_request_metadata storing the corresponding ground truth. { "input_data": { "input_string": ["###"], "parameters": { "temperature": 0.6, "max_new_tokens": 100, "do_sample": true } }, "_batch_request_metadata": ###<_batch_request_metadata> } For AOAI chat completion model, the following pattern can be used, { "messages": ###, "temperature": 0.7, "top_p": 0.95, "frequency_penalty": 0, "presence_penalty": 0, "max_tokens": 800, "stop": null }
|
string | False | ||
endpoint_url | The URL of the endpoint. | string | False | ||
is_performance_test | If true, the performance test will be run and the input dataset will be neglected. | boolean | False | ||
use_tiktoken | If true, cl100k_base encoder is used from tiktoken to calculate token count; overrides any other token count calculation. |
boolean | False | True | |
authentication_type | Authentication type for endpoint- azureml_workspace_connection or managed_identity. | string | azureml_workspace_connection | False | ['azureml_workspace_connection', 'managed_identity'] |
deployment_name | The deployment name. Only needed for managed OSS deployment. | string | True | ||
connections_name | Connections name for the endpoint. Only required if authentication_type is "azureml_workspace_connection". | string | True | ||
label_column_name | The label column name. | string | True | ||
additional_columns | The name(s) for additional columns that could be helpful to calculate some metrics, separated by comma (","). | string | True | ||
n_samples | The number of top samples send to endpoint. When performance test is enabled, this will be the number of repeated samples send to the endpoint. | integer | True | ||
handle_response_failure | The way that the formatter handles the failed response. 'use_fallback' will replace them with fallback_value and 'neglect' will drop those rows. | string | use_fallback | False | ['use_fallback', 'neglect'] |
fallback_value | The fallback value that can be used when request payload failed. If not provided, the fallback value will be an empty string. | string | True | ||
min_endpoint_success_ratio | The minimum value of (successful_requests / total_requests) required for classifying inference as successful. If (successful_requests / total_requests) < min_endpoint_success_ratio, the experiment will be marked as failed. By default it is 0. (0 means all requests are allowed to fail while 1 means no request should fail.) | number | 0 | False | |
additional_headers | A stringified json expressing additional headers to be added to each request. | string | True | ||
ensure_ascii | If ensure_ascii is true, the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. More detailed information can be found at https://docs.python.org/3/library/json.html | boolean | False | False | |
max_retry_time_interval | The maximum time (in seconds) spent retrying a payload. If unspecified, payloads are retried unlimited times. | integer | True | ||
mini_batch_size | The mini batch size for parallel run. | string | 100KB | True | |
endpoint_config_file | The endpoint config file. | uri_file | True | ||
initial_worker_count | The initial number of workers to use for scoring. | integer | 5 | False | |
max_worker_count | Overrides initial_worker_count if necessary | integer | 200 | False | |
instance_count | Number of nodes in a compute cluster we will run the train step on. | integer | 1 | ||
max_concurrency_per_instance | Number of processes that will be run concurrently on any given node. This number should not be larger than 1/2 of the number of cores in an individual node in the specified cluster. | integer | 1 | ||
debug_mode | Enable debug mode will print all the debug logs in the score step. | boolean | False | False | |
app_insights_connection_string | Application insights connection string where the batch score component will log metrics and logs. | string | True |
Name | Description | Type |
---|---|---|
predictions | The prediction data. | uri_file |
performance_metadata | The performance data. | uri_file |
ground_truth | The ground truth data that has a one-to-one mapping with the prediction data. | uri_file |
successful_requests | The successful requests. | uri_file |
failed_requests | The failed requests. | uri_file |
unsafe_content_blocked_requests | The unsafe requests that were blocked due to Responsible AI concerns. | uri_file |