Skip to content

Commit

Permalink
Merge pull request #71 from skytin1004/improve-doc
Browse files Browse the repository at this point in the history
E2E Samples markdown improvements
  • Loading branch information
leestott authored Jul 2, 2024
2 parents a6c18d8 + 74717ed commit 67e4e5a
Show file tree
Hide file tree
Showing 4 changed files with 220 additions and 178 deletions.
26 changes: 13 additions & 13 deletions md/06.E2ESamples/E2E_Phi-3-FineTuning_PromptFlow_Integration.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Fine-tune and Integrate custom Phi-3 models with Prompt flow

This end-to-end (E2E) sample is based on the guide "[Fine-Tune and Integrate Custom Phi-3 Models with Prompt Flow: A Step-by-Step Guide](https://techcommunity.microsoft.com/t5/educator-developer-blog/fine-tune-and-integrate-custom-phi-3-models-with-prompt-flow/ba-p/4178612)" from the Microsoft Tech Community. It introduces the processes of fine-tuning, deploying, and integrating custom Phi-3 models with Prompt flow.
This end-to-end (E2E) sample is based on the guide "[Fine-Tune and Integrate Custom Phi-3 Models with Prompt Flow: A Step-by-Step Guide](https://techcommunity.microsoft.com/t5/educator-developer-blog/fine-tune-and-integrate-custom-phi-3-models-with-prompt-flow/ba-p/4178612?wt.mc_id=studentamb_279723)" from the Microsoft Tech Community. It introduces the processes of fine-tuning, deploying, and integrating custom Phi-3 models with Prompt flow.

## Overview

Expand All @@ -25,11 +25,11 @@ Here is an overview of this E2E sample.
- [Set up project](#set-up-project)
- [Prepare dataset for fine-tuning](#prepare-dataset-for-fine-tuning)

2. **[Scenario 2: Fine-tune Phi-3 model and Deploy in Azure Machine Learning Studio](#scenario-2-fine-tune-phi-3-model-and-deploy-in-azure-machine-learning-studio)**
1. **[Scenario 2: Fine-tune Phi-3 model and Deploy in Azure Machine Learning Studio](#scenario-2-fine-tune-phi-3-model-and-deploy-in-azure-machine-learning-studio)**
- [Set up Azure CLI](#set-up-azure-cli)
- [Deploy the fine-tuned model](#deploy-the-fine-tuned-model)

3. **[Scenario 3: Integrate with Prompt flow and Chat with your custom model](#scenario-3-integrate-with-prompt-flow-and-chat-with-your-custom-model)**
1. **[Scenario 3: Integrate with Prompt flow and Chat with your custom model](#scenario-3-integrate-with-prompt-flow-and-chat-with-your-custom-model)**
- [Integrate the custom Phi-3 model with Prompt flow](#integrate-the-custom-phi-3-model-with-prompt-flow)
- [Chat with your custom model](#chat-with-your-custom-model)

Expand Down Expand Up @@ -64,13 +64,13 @@ In this E2E sample, you will use the *Standard_NC6s_v3 GPU* for fine-tuning, whi

![Type azure machine learning](../../imgs/03/FineTuning-PromptFlow/01-03-type-azml.png)

2. Select **+ Create** from the navigation menu.
1. Select **+ Create** from the navigation menu.

3. Select **New workspace** from the navigation menu.
1. Select **New workspace** from the navigation menu.

![Select new workspace](../../imgs/03/FineTuning-PromptFlow/01-04-select-new-workspace.png)

4. Perform the following tasks:
1. Perform the following tasks:

- Select your Azure **Subscription**.
- Select the **Resource group** to use (create a new one if needed).
Expand All @@ -81,9 +81,9 @@ In this E2E sample, you will use the *Standard_NC6s_v3 GPU* for fine-tuning, whi
- Select the **Application insights** to use (create a new one if needed).
- Select the **Container registry** to **None**.

5. Select **Review + Create**.
1. Select **Review + Create**.

6. Select **Create**.
1. Select **Create**.

### Add role assignment

Expand Down Expand Up @@ -198,7 +198,7 @@ Now, you will create a folder to work in and set up a virtual environment to dev
mkdir finetune-phi
```

2. Type the following command inside your terminal to navigate to the *finetune-phi* folder you created.
1. Type the following command inside your terminal to navigate to the *finetune-phi* folder you created.

```console
cd finetune-phi
Expand All @@ -212,7 +212,7 @@ Now, you will create a folder to work in and set up a virtual environment to dev
python -m venv .venv
```

2. Type the following command inside your terminal to activate the virtual environment.
1. Type the following command inside your terminal to activate the virtual environment.

```console
.venv\Scripts\activate.bat
Expand Down Expand Up @@ -324,7 +324,7 @@ In this exercise, you will:

1. In the left pane of Visual Studio Code, right-click and select **New File** to create a new file named *config.py*.

2. Add the following code to the *config.py* file to include your Azure information.
1. Add the following code to the *config.py* file to include your Azure information.

```python
# Azure settings
Expand Down Expand Up @@ -491,9 +491,9 @@ You need to set up Azure CLI to authenticate your environment. Azure CLI allows
az login
```

2. Select your Azure account to use.
1. Select your Azure account to use.

3. Select your Azure subscription to use.
1. Select your Azure subscription to use.

![Find resource group name.](../../imgs/03/FineTuning-PromptFlow/02-01-login-using-azure-cli.png)

Expand Down
248 changes: 136 additions & 112 deletions md/06.E2ESamples/E2E_Phi-3-MLflow.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@

# **MLflow**
# MLflow

[MLflow]*https://mlflow.org/) is an open-source platform designed to manage the end-to-end machine learning lifecycle.
[MLflow](https://mlflow.org/) is an open-source platform designed to manage the end-to-end machine learning lifecycle.

![MLFlow](../../imgs/03/MLflow/MlFlowmlops.png)

MLFlow is used to manage the ML lifecycle, including experimentation, reproducibility, deployment and a central model registry ML flow currently offers four components.

- **MLflow Tracking:** Record and query experiements, code, data config and results.
- **MLflow Tracking:** Record and query experiements, code, data config and results.
- **MLflow Projects:** Package data science code in a format to reproduce runs on any platform.
- **Mlflow Models:** Deploy machine learning models in diverse serving environments.
- **Model Registry:** Store, annotate and manage models in a central repository.
Expand All @@ -24,9 +24,9 @@ Key features of MLFlow include:
- **Projects:** Package ML code for sharing or production use.
MLFlow also supports the MLOps loop, which includes preparing data, registering and managing models, packaging models for execution, deploying services, and monitoring models. It aims to simplify the process of moving from a prototype to a production workflow, especially in cloud and edge environments.

## **E2E Scenario - Building a wrapper and using Phi-3 as an MLFlow model**
## E2E Scenario - Building a wrapper and using Phi-3 as an MLFlow model

In this E2E sample we will demonstrate two different approaches to building a wrapper around Phi-3 small language model (SLM) and then running it as an MLFlow model either locally or in the cloud, e.g., in Azure Machine Learning workspace.
In this E2E sample we will demonstrate two different approaches to building a wrapper around Phi-3 small language model (SLM) and then running it as an MLFlow model either locally or in the cloud, e.g., in Azure Machine Learning workspace.

![MLFlow](../../imgs/03/MLflow/MlFlow1.png)

Expand All @@ -36,120 +36,144 @@ In this E2E sample we will demonstrate two different approaches to building a wr
| Custom Python Wrapper | At the time of writing, the transformer pipeline did not support MLFlow wrapper generation for HuggingFace models in ONNX format, even with the experimental optimum Python package. For the cases like this, you can build your custom Python wrapper for MLFlow mode | [**CustomPythonWrapper.ipynb**](E2E_Phi-3-MLflow_CustomPythonWrapper.ipynb) |

## Project: Transformer Pipeline

1. You would require relevant Python packages from MLFlow and HuggingFace:
``` Python
import mlflow
import transformers
```

``` Python
import mlflow
import transformers
```

2. Next, you should initiate a transformer pipeline by referring to the target Phi-3 model in the HuggingFace registry. As can be seen from the _Phi-3-mini-4k-instruct_’s model card, its task is of a “Text Generation” type:
``` Python
pipeline = transformers.pipeline(
task = "text-generation",
model = "microsoft/Phi-3-mini-4k-instruct"
)
```

``` Python
pipeline = transformers.pipeline(
task = "text-generation",
model = "microsoft/Phi-3-mini-4k-instruct"
)
```

3. You can now save your Phi-3 model’s transformer pipeline into MLFlow format and provide additional details such as the target artifacts path, specific model configuration settings and inference API type:
``` Python
model_info = mlflow.transformers.log_model(
transformers_model = pipeline,
artifact_path = "phi3-mlflow-model",
model_config = model_config,
task = "llm/v1/chat"
)
```

``` Python
model_info = mlflow.transformers.log_model(
transformers_model = pipeline,
artifact_path = "phi3-mlflow-model",
model_config = model_config,
task = "llm/v1/chat"
)
```

## Project: Custom Python Wrapper

1. We can utilise here Microsoft's [ONNX Runtime generate() API](https://github.com/microsoft/onnxruntime-genai) for the ONNX model's inference and tokens encoding / decoding. You have to choose _onnxruntime_genai_ package for your target compute, with the below example targeting CPU:
``` Python
import mlflow
from mlflow.models import infer_signature
import onnxruntime_genai as og
```
2. Our custom class implements two methods: _load_context()_ to initialise the **ONNX model** of Phi-3 Mini 4K Instruct, **generator parameters** and **tokenizer**; and _predict()_ to generate output tokens for the provided prompt:
``` Python
class Phi3Model(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# Retrieving model from the artifacts
model_path = context.artifacts["phi3-mini-onnx"]
model_options = {
"max_length": 300,
"temperature": 0.2,
}

# Defining the model
self.phi3_model = og.Model(model_path)
self.params = og.GeneratorParams(self.phi3_model)
self.params.set_search_options(**model_options)

``` Python
import mlflow
from mlflow.models import infer_signature
import onnxruntime_genai as og
```

1. Our custom class implements two methods: _load_context()_ to initialise the **ONNX model** of Phi-3 Mini 4K Instruct, **generator parameters** and **tokenizer**; and _predict()_ to generate output tokens for the provided prompt:

``` Python
class Phi3Model(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# Retrieving model from the artifacts
model_path = context.artifacts["phi3-mini-onnx"]
model_options = {
"max_length": 300,
"temperature": 0.2,
}

# Defining the tokenizer
self.tokenizer = og.Tokenizer(self.phi3_model)

def predict(self, context, model_input):
# Retrieving prompt from the input
prompt = model_input["prompt"][0]
self.params.input_ids = self.tokenizer.encode(prompt)

# Generating the model's response
response = self.phi3_model.generate(self.params)

return self.tokenizer.decode(response[0][len(self.params.input_ids):])
```
3. You can use now _mlflow.pyfunc.log_model()_ function to generate a custom Python wrapper (in pickle format) for the Phi-3 model, along with the original ONNX model and required dependencies:
``` Python
model_info = mlflow.pyfunc.log_model(
artifact_path = artifact_path,
python_model = Phi3Model(),
artifacts = {
"phi3-mini-onnx": "cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4",
},
input_example = input_example,
signature = infer_signature(input_example, ["Run"]),
extra_pip_requirements = ["torch", "onnxruntime_genai", "numpy"],
)
```
# Defining the model
self.phi3_model = og.Model(model_path)
self.params = og.GeneratorParams(self.phi3_model)
self.params.set_search_options(**model_options)

# Defining the tokenizer
self.tokenizer = og.Tokenizer(self.phi3_model)

def predict(self, context, model_input):
# Retrieving prompt from the input
prompt = model_input["prompt"][0]
self.params.input_ids = self.tokenizer.encode(prompt)

# Generating the model's response
response = self.phi3_model.generate(self.params)

return self.tokenizer.decode(response[0][len(self.params.input_ids):])
```

1. You can use now _mlflow.pyfunc.log_model()_ function to generate a custom Python wrapper (in pickle format) for the Phi-3 model, along with the original ONNX model and required dependencies:

``` Python
model_info = mlflow.pyfunc.log_model(
artifact_path = artifact_path,
python_model = Phi3Model(),
artifacts = {
"phi3-mini-onnx": "cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4",
},
input_example = input_example,
signature = infer_signature(input_example, ["Run"]),
extra_pip_requirements = ["torch", "onnxruntime_genai", "numpy"],
)
```

## Signatures of generated MLFlow models

1. In step 3 of the Transformer Pipeline project above, we set the MLFlow model’s task to “_llm/v1/chat_”. Such instruction generates a model’s API wrapper, compatible with OpenAI’s Chat API as shown below:
``` Python
{inputs:
['messages': Array({content: string (required), name: string (optional), role: string (required)}) (required), 'temperature': double (optional), 'max_tokens': long (optional), 'stop': Array(string) (optional), 'n': long (optional), 'stream': boolean (optional)],
outputs:
['id': string (required), 'object': string (required), 'created': long (required), 'model': string (required), 'choices': Array({finish_reason: string (required), index: long (required), message: {content: string (required), name: string (optional), role: string (required)} (required)}) (required), 'usage': {completion_tokens: long (required), prompt_tokens: long (required), total_tokens: long (required)} (required)],
params:
None}
```
2. As a result, you can submit your prompt in the following format:
``` Python
messages = [{"role": "user", "content": "What is the capital of Spain?"}]
```
3. Then, use OpenAI API-compatible post-processing, e.g., _response[0][‘choices’][0][‘message’][‘content’]_, to beautify your output to something like this:
``` JSON
Question: What is the capital of Spain?

Answer: The capital of Spain is Madrid. It is the largest city in Spain and serves as the political, economic, and cultural center of the country. Madrid is located in the center of the Iberian Peninsula and is known for its rich history, art, and architecture, including the Royal Palace, the Prado Museum, and the Plaza Mayor.

Usage: {'prompt_tokens': 11, 'completion_tokens': 73, 'total_tokens': 84}
```
4. In step 3 of the Custom Python Wrapper project above, we allow the MLFlow package to generate the model’s signature from a given input example. Our MLFlow wrapper's signature will look like this:
``` Python
{inputs:
['prompt': string (required)],
outputs:
[string (required)],
params:
None}
```
5. So, our prompt would need to contain "prompt" dictionary key, similar to this:
``` Python
{"prompt": "<|system|>You are a stand-up comedian.<|end|><|user|>Tell me a joke about atom<|end|><|assistant|>",}
```
6. The model's output will be provided then in string format:
``` JSON
Alright, here's a little atom-related joke for you!

Why don't electrons ever play hide and seek with protons?

Because good luck finding them when they're always "sharing" their electrons!

Remember, this is all in good fun, and we're just having a little atomic-level humor!
```

``` Python
{inputs:
['messages': Array({content: string (required), name: string (optional), role: string (required)}) (required), 'temperature': double (optional), 'max_tokens': long (optional), 'stop': Array(string) (optional), 'n': long (optional), 'stream': boolean (optional)],
outputs:
['id': string (required), 'object': string (required), 'created': long (required), 'model': string (required), 'choices': Array({finish_reason: string (required), index: long (required), message: {content: string (required), name: string (optional), role: string (required)} (required)}) (required), 'usage': {completion_tokens: long (required), prompt_tokens: long (required), total_tokens: long (required)} (required)],
params:
None}
```

1. As a result, you can submit your prompt in the following format:

``` Python
messages = [{"role": "user", "content": "What is the capital of Spain?"}]
```

1. Then, use OpenAI API-compatible post-processing, e.g., _response[0][‘choices’][0][‘message’][‘content’]_, to beautify your output to something like this:

``` JSON
Question: What is the capital of Spain?

Answer: The capital of Spain is Madrid. It is the largest city in Spain and serves as the political, economic, and cultural center of the country. Madrid is located in the center of the Iberian Peninsula and is known for its rich history, art, and architecture, including the Royal Palace, the Prado Museum, and the Plaza Mayor.

Usage: {'prompt_tokens': 11, 'completion_tokens': 73, 'total_tokens': 84}
```

1. In step 3 of the Custom Python Wrapper project above, we allow the MLFlow package to generate the model’s signature from a given input example. Our MLFlow wrapper's signature will look like this:

``` Python
{inputs:
['prompt': string (required)],
outputs:
[string (required)],
params:
None}
```

1. So, our prompt would need to contain "prompt" dictionary key, similar to this:

``` Python
{"prompt": "<|system|>You are a stand-up comedian.<|end|><|user|>Tell me a joke about atom<|end|><|assistant|>",}
```

1. The model's output will be provided then in string format:

``` JSON
Alright, here's a little atom-related joke for you!

Why don't electrons ever play hide and seek with protons?

Because good luck finding them when they're always "sharing" their electrons!

Remember, this is all in good fun, and we're just having a little atomic-level humor!
```
Loading

0 comments on commit 67e4e5a

Please sign in to comment.