Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch score for oss #3387

Merged
merged 50 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
63b5ce5
batch score for oss
novaturient95 Sep 17, 2024
30b965b
update src
novaturient95 Sep 17, 2024
016376c
update file structure
novaturient95 Sep 17, 2024
5dd1e8f
Merge branch 'main' of https://github.com/Azure/azureml-assets into a…
novaturient95 Sep 17, 2024
abb1b75
update
novaturient95 Sep 17, 2024
0a5089f
fix tests
novaturient95 Sep 17, 2024
532ff5b
update tests
novaturient95 Sep 17, 2024
82959ea
update for e2e test
novaturient95 Sep 18, 2024
d1ad19e
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 18, 2024
d2aa8db
fix entry script
novaturient95 Sep 18, 2024
b3b8542
Merge branch 'ayushmishra/batch-score-oss' of https://github.com/Azur…
novaturient95 Sep 18, 2024
8c29298
uncomment default API type
novaturient95 Sep 18, 2024
c3b6ebf
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 18, 2024
afe2d9b
address comments
novaturient95 Sep 18, 2024
9267c87
return None by default
novaturient95 Sep 18, 2024
b9d4d51
fix flakes
novaturient95 Sep 18, 2024
6d52f46
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 19, 2024
da27c6c
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 19, 2024
ab1e290
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 19, 2024
7835706
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 20, 2024
cc90308
Update assets/batch_score/components/driver/src/batch_score_oss/aoai/…
novaturient95 Sep 20, 2024
ccb01ae
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 20, 2024
96b968c
Update assets/batch_score/components/driver/src/batch_score_oss/aoai/…
novaturient95 Sep 20, 2024
9aaa468
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 20, 2024
ef4ad01
Update docs
novaturient95 Sep 20, 2024
8479fd2
Merge branch 'ayushmishra/batch-score-oss' of https://github.com/Azur…
novaturient95 Sep 20, 2024
be096b6
fix flake
novaturient95 Sep 20, 2024
da93346
e2e .version file does not exists, fix
novaturient95 Sep 20, 2024
eda18af
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 20, 2024
ea5a2e5
keep batch_score intanct. Move changes to batch_score_oss
novaturient95 Sep 20, 2024
91a3342
Merge branch 'main' of https://github.com/Azure/azureml-assets into a…
novaturient95 Sep 20, 2024
15362b6
revert
novaturient95 Sep 20, 2024
2961646
add oss wf
novaturient95 Sep 20, 2024
3743042
update wf name
novaturient95 Sep 20, 2024
8d858d4
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 23, 2024
dceff97
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 24, 2024
a3adeaf
Increase lock timeout
novaturient95 Sep 24, 2024
1a78f68
fix asset version issue
novaturient95 Sep 24, 2024
53ece9c
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 24, 2024
12538ac
fix path
novaturient95 Sep 24, 2024
19ce4c9
Merge branch 'ayushmishra/batch-score-oss' of https://github.com/Azur…
novaturient95 Sep 24, 2024
3fc2ddb
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 25, 2024
3c16ff0
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 25, 2024
bd514f3
fix multi-threaded e2e pytest issues
novaturient95 Sep 25, 2024
de4944e
Merge branch 'ayushmishra/batch-score-oss' of https://github.com/Azur…
novaturient95 Sep 25, 2024
7b65dd2
flake
novaturient95 Sep 25, 2024
a57015a
update e2e tests
novaturient95 Sep 25, 2024
aadd25e
Merge branch 'main' into ayushmishra/batch-score-oss
novaturient95 Sep 25, 2024
5411257
fix flake
novaturient95 Sep 25, 2024
c9885d7
Merge branch 'ayushmishra/batch-score-oss' of https://github.com/Azur…
novaturient95 Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions .github/workflows/batch-score-oss-ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
name: batch-score-oss-ci

on:
pull_request:
branches:
- main
paths:
- assets/batch_score_oss/**
- .github/workflows/batch-score-oss-ci.yml
workflow_dispatch:


concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
testsRootPath: assets/batch_score_oss/components/driver/tests
pytest_report_folder: results
pytest_report_file: junit3.xml
scripts_setup_dir: scripts/setup
permissions:
# Required to clone repo
contents: read
# Required for OIDC login to Azure
id-token: write

defaults:
run:
shell: bash

jobs:
check-execution-context:
uses: Azure/azureml-assets/.github/workflows/check-execution-context.yaml@main
run-batch-score-oss-tests:
name: Run Batch Score Component Tests
runs-on: ubuntu-latest
needs: check-execution-context
environment: Testing
steps:
- name: Clone branch
uses: Azure/azureml-assets/.github/actions/clone-repo@main
with:
forked-pr: ${{ needs.check-execution-context.outputs.forked_pr }}
- name: Use Python 3.10 or newer
uses: actions/setup-python@v4
with:
python-version: '>=3.10'
- name: Log in to Azure and create resources
uses: ./.github/actions/create-azure-resources
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
scripts-setup-dir: ${{ env.scripts_setup_dir }}

- name: Install dependencies
run: pip install -r ${{ env.testsRootPath }}/requirements.txt

- name: Run unit tests
run: python -m pytest --junitxml=${{ env.pytest_report_folder }}/${{ env.pytest_report_file }} ${{ env.testsRootPath }} --strict-markers -v -s -m "unit" -o log_level=DEBUG -n 8
env:
SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
RESOURCE_GROUP: ${{ env.resource_group }}
WORKSPACE_NAME: ${{ env.workspace }}
- name: Run e2e tests
run: python -m pytest --junitxml=${{ env.pytest_report_folder }}/${{ env.pytest_report_file }} ${{ env.testsRootPath }} --strict-markers -v -s -m "smoke" -o log_level=DEBUG -n 8
env:
SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
RESOURCE_GROUP: ${{ env.resource_group }}
WORKSPACE_NAME: ${{ env.workspace }}
- name: Upload test results
uses: actions/upload-artifact@v3
if: always()
with:
name: ${{ env.pytest_report_folder }}
path: ${{ env.pytest_report_folder }}
report:
name: Publish test results
if: always()
runs-on: ubuntu-latest
needs: run-batch-score-oss-tests

permissions:
# Required for EnricoMi/publish-unit-test-result-action
checks: write
issues: read
pull-requests: write

steps:
- name: Download test results
id: download-artifact
uses: actions/download-artifact@v3
with:
name: ${{ env.pytest_report_folder }}
path: ${{ env.pytest_report_folder }}
continue-on-error: true

- name: Publish test results
if: steps.download-artifact.outputs.download-path != ''
uses: EnricoMi/publish-unit-test-result-action@v2
with:
check_name: Test Results for ${{ github.workflow }}
junit_files: ${{ env.pytest_report_folder }}/**/*.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
type: component
spec: spec.yaml
categories: ["Batch Score"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
$schema: http://azureml/sdk-2-0/ParallelComponent.json
type: parallel

name: batch_score_oss
version: 0.0.1
display_name: Batch Score Large Language Models
is_deterministic: False

inputs:
# Predefined arguments for parallel job: https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-parallel?source=recommendations#predefined-arguments-for-parallel-job
resume_from:
type: string
optional: True
description: The pipeline run id to resume from

# PRS preview feature
async_mode:
type: boolean
optional: True
default: False
description: Whether to use PRS mini-batch streaming feature, which allows each PRS processor to process multiple mini-batches at a time.

# Custom arguments
configuration_file:
type: uri_file
optional: False
description: Configures the behavior of batch scoring.
data_input_table:
type: mltable
optional: False
description: The data to be split and scored in parallel.

outputs:
job_output_path:
type: uri_file
mini_batch_results_output_directory:
type: uri_folder

max_concurrency_per_instance: 1
resources:
instance_count: 1
mini_batch_size: 3kb
mini_batch_error_threshold: 5
logging_level: "DEBUG"
retry_settings:
max_retries: 2
timeout: 60

input_data: ${{inputs.data_input_table}}

task:
code: ../src
type: run_function
entry_script: batch_score_oss.main
# Enable PRS safe append row configuration that is needed when dealing with large outputs with Unicode characters.
# Using --append_row_safe_output true
program_arguments: >-
$[[--amlbi_async_mode ${{inputs.async_mode}}]]
--amlbi_dataframe_mixed_types true
--append_row_safe_output true
--configuration_file ${{inputs.configuration_file}}
--partitioned_scoring_results ${{outputs.mini_batch_results_output_directory}}
$[[--resume_from ${{inputs.resume_from}}]]
environment: azureml://registries/azureml/environments/model-evaluation/versions/36
append_row_to: ${{outputs.job_output_path}}
135 changes: 135 additions & 0 deletions assets/batch_score_oss/components/driver/dev/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Testing Locally

The [batch_score_simulator.py](https://msdata.visualstudio.com/Vienna/_git/batch-score?path=/driver/dev/batch_score_simulator.py) script can be used to test the driver script locally. It simulates a 1-instance, 1-process PRS scenario by partitioning the [MLTable](https://msdata.visualstudio.com/Vienna/_git/batch-score?path=/driver/dev/training) data into mini batches and execute the driver's `init()`, `run()`, and `shutdown()` functions appropriately, as well as write the results of `run()` to file.

This folder does not verify any yaml configurations.

## Create a virtual environment
- Download and install [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
- Open Anaconda Prompt terminal (Search Anaconda in start menu).
- Create a new environment: `conda create -n batch_score_test_env python=3.8`
- Activate the environment: `conda activate batch_score_test_env`
- Ensure python version is 3.8.16: `python --version`

## Install dependencies
- Navigate to the root enlistment folder of the batch-score repo.
- Install the dependencies: `pip install -r driver\tests\requirements.txt`

## Create launch configuration file
- Navigate to the enlistment root.
- Create a new folder `.vscode`.
- Create a new file `launch.json` with the following content:
```json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Attach",
"type": "python",
"request": "launch",
"console": "integratedTerminal",
"cwd": "${workspaceFolder}/driver/dev",
"program": "batch_score_simulator.py",
"justMyCode": true,
"env": {
"PYTHONPATH": "${workspaceFolder}/driver"
},
"args": [
"--debug_mode", "true",
"--online_endpoint_url", "https://real-dv3-stable.centralus.inference.ml.azure.com/v1/engines/davinci/completions",
]
}
]
}
```
## Running simulator.py
- Open Visual Studio Code from the conda environment terminal: `code .`
- Open command palette (Ctrl+Shift+P) and search for `Python: Select Interpreter`.
- Set the python interpreter to the python version of your conda environment.
- Hit `F5` to start a debugging session.

### Optional simulator features

In addition to simulating the PRS runtime environment, the simulator script can also provide a simulated endpoint to score against, a simulated batch pool routing service, and a simulated quota/rate limiter service. Each of these can be used independently from the others, so you can pick and choose which dependencies are real and which are fake.

The simulated services are lenient with what inputs they accept. E.g., the quota simulator doesn't care what audience you request.

#### Endpoint simulator

To enable the simulated ML endpoint, provide the scoring URL `{{ENDPOINT_SIMULATOR_HOST}}/v1/engines/davinci/completions` either as the `--online_endpoint_url` command-line argument or as the routing simulator's endpoint when simulating a batch pool.

```json
"env": {
"ENDPOINT_SIMULATOR_WORK_SECONDS": 10,
// Set this if using the batch pool routing simulator (see next section).
"ROUTING_SIMULATOR_ENDPOINT_URI": "{{ENDPOINT_SIMULATOR_HOST}}/v1/engines/davinci/completions",
},
"args": [
// Set this if not using the batch pool feature.
"--online_endpoint_url", "{{ENDPOINT_SIMULATOR_HOST}}/v1/engines/davinci/completions",
]
```

#### Routing simulator

To enable the simulated routing service, provide two values in your `launch.json` environment:

```json
"env": {
"BATCH_SCORE_ROUTING_BASE_URL": "{{ROUTING_SIMULATOR_HOST}}/api",
"ROUTING_SIMULATOR_ENDPOINT_URI": "{{ENDPOINT_SIMULATOR_HOST}}/v1/engines/ada/completions",
}
```

The `BATCH_SCORE_ROUTING_BASE_URL` variable tells the routing code in the client where find the fake routing service, and the `ROUTING_SIMULATOR_ENDPOINT_URI` variable tells the routing simulator itself what endpoint to return. You can set it to the endpoint simulator as in this example, or a real endpoint scoring URI. (The routing simulator will always return a single endpoint for any batch pool requested.)

#### Quota simulator

To enable the quota simulator, add the `BATCH_SCORE_QUOTA_BASE_URL` environment variable in your `launch.json`, and optionally also set `QUOTA_SIMULATOR_CAPACITY` to configure a specific amount of simulated total quota:

```json
"env": {
"BATCH_SCORE_QUOTA_BASE_URL": "{{QUOTA_SIMULATOR_HOST}}/ratelimiter",
"QUOTA_SIMULATOR_CAPACITY": "2048",
}
```

### Optional auth configuration

By default, running the batch score component locally will use your `az` login to fetch the access tokens it needs. If you wanted to override that token with a different one, you could either run `az login` to authenticate under a different account or manually write a token to a local file and pass it to the component via command line:

```json
"args": [
"--token_file_path", "<Your Path Here>\\batch-score\\driver\\dev\\secrets\\token.txt"
]
```

### Command Line

Running simulator.py through command line is also possible.navigate to the `dev/` folder and run the script, passing in flags as needed. Again, ensure the appropriate Python version is used.

```bash
python simulator.py --debug_mode=True --online_endpoint_url=https://pr-wenbinmeng.eastus.inference.ml.azure.com/v1/engines/ada/completions --azureml_model_deployment=api-ci-ea27f087 --token_file_path=./secrets/token.txt
```

## Defining Data to Test against

Update the files in the `training/` folder with the data you would like to test.

Or create new datasets to use. An example of how to create an MLtable from Huggingface's CNN DailyMail data is documented in [create_dataset.py](./datasets/create_dataset.py)

## Full E2E Testing

First, create the component by navigating to the `yamls/components/` folder:

```bash
az ml component create --file dynamic_parallel_batch_score.yml
```

Then, create the job. Using the `quickstart/` directory is a good starting point. Refer to the quickstart [Pipeline Job Creation Step](../../quickstart/README.md#4-create-the-pipeline-job). The same readme shows instructions on how to configure your job through the CLI, as well as how to view output.

You can monitor the progress of the job through either ML Studio UI or the following CLI command:

```bash
az ml job show --name=<job name, a GUID>
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

"""Batch score simulator."""

import mltable
import os

import src.batch_score.main as main

from pathlib import Path

from endpoint_simulator import EndpointSimulator
from quota_simulator import QuotaSimulator
from routing_simulator import RoutingSimulator


class MiniBatchContext(object):
"""This is a context class containing partition and dataset info of mini-batches partitioned by keys."""

def __init__(self, partition_key_value=None, dataset=None, minibatch_index=None):
"""Init the instance."""
self._partition_key_value = partition_key_value
self._dataset = dataset
self._minibatch_index = minibatch_index

@property
def partition_key_value(self):
"""Return the dict of partition-key-value corresponding to the mini-batch."""
return self._partition_key_value

@property
def dataset(self):
"""Return the sub dataset corresponding to the mini-batch."""
return self._dataset

@property
def minibatch_index(self):
"""Return the minibatch identity."""
return self._minibatch_index


# Simulate PRS with a single Processor on a single Node
class Simulator:
"""PRS Simulator."""

def __init__(self, data_input_folder_path):
"""Initialize PRS Simulator."""
os.getcwd()

self.__mltable_data: mltable = mltable.load(data_input_folder_path)
self.__df_data = self.__mltable_data.to_pandas_dataframe()
self.__minibatch_size = 500 # lines
self.__cur_index = 0

def start(self):
"""Start the simulator."""
main.init()
results: list[str] = []

while self.__cur_index < self.__df_data.shape[0]:
end_index = self.__cur_index + self.__minibatch_size
if end_index > self.__df_data.shape[0]:
end_index = self.__df_data.shape[0]
df_subset = self.__df_data.iloc[self.__cur_index:end_index]
self.__cur_index = end_index

results.extend(main.run(df_subset, MiniBatchContext(minibatch_index=10)))

main.shutdown()

out_dir = "./out"
Path(out_dir).mkdir(parents=True, exist_ok=True)
with open(
os.path.join(out_dir, "prs-sim.txt"), "wt", encoding="utf-8"
) as txt_file:
print("\n".join(results), file=txt_file)


EndpointSimulator.initialize()
QuotaSimulator.initialize()
RoutingSimulator.initialize()

sim = Simulator("./training/")
sim.start()
Loading
Loading