Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Commit

Permalink
[docs] Update changes to RayLLM
Browse files Browse the repository at this point in the history
[docs] Update changes to RayLLM
  • Loading branch information
richardliaw authored Oct 4, 2023
2 parents fa6f017 + fadd923 commit 5d220c6
Show file tree
Hide file tree
Showing 3 changed files with 211 additions and 316 deletions.
189 changes: 115 additions & 74 deletions docs/kuberay/deploy-on-eks.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Deploy Aviary on Amazon EKS using KubeRay
# Deploy RayLLM on Amazon EKS using KubeRay
* Note that this document will be extended to include Ray autoscaling and the deployment of multiple models in the near future.

# Part 1: Set up a Kubernetes cluster on Amazon EKS
Expand Down Expand Up @@ -84,15 +84,15 @@ helm install kuberay-operator kuberay/kuberay-operator --version 0.6.0

At this point, you have two options:

1. You can deploy Aviary manually on a `RayCluster` (Part 3), or
2. You can deploy Aviary using a [`RayService` custom resource](https://ray-project.github.io/kuberay/guidance/rayservice/) (Part 4).
1. You can deploy RayLLM manually on a `RayCluster` (Part 3), or
2. You can deploy RayLLM using a [`RayService` custom resource](https://ray-project.github.io/kuberay/guidance/rayservice/) (Part 4).

The first option is more flexible for conducting experiments.
The second option is recommended for production use due to the additional high availability features provided by the `RayService` custom resource, which will manage the underlying `RayCluster`s for you.

# Part 3: Deploy Aviary on a RayCluster (recommended for experiments)
# Part 3: Deploy RayLLM on a RayCluster (recommended for experiments)

## Step 1: Create a RayCluster with Aviary
## Step 1: Create a RayCluster with RayLLM

```sh
# path: docs/kuberay
Expand Down Expand Up @@ -122,7 +122,7 @@ Something is worth noticing:
resources: '"{\"accelerator_type_cpu\": 48, \"accelerator_type_a10\": 2, \"accelerator_type_a100_80g\": 2}"'
```
## Step 2: Deploy a LLM model with Aviary
## Step 2: Deploy an LLM model with RayLLM
```sh
# Step 7.1: Log in to the head Pod
Expand All @@ -133,67 +133,96 @@ kubectl exec -it $HEAD_POD -- bash
# If you don't have one, you can skip this step and deploy other models in Step 7.3.
export HUGGING_FACE_HUB_TOKEN=${YOUR_HUGGING_FACE_HUB_TOKEN}

# Step 7.3: Deploy a LLM model. You can deploy Falcon-7B if you don't have a Hugging Face Hub token.
# Step 7.3: Deploy an LLM model. You can deploy Falcon-7B if you don't have a Hugging Face Hub token.
# (1) Llama 2 7B
aviary run --model ~/models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
# (2) Falcon 7B
aviary run --model ./models/continuous_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml
serve run serve/meta-llama--Llama-2-7b-chat-hf.yaml

# Step 7.3: Check the Serve application status
serve status

# [Example output]
# name: OpenAssistant--falcon-7b-sft-top1-696
# app_status:
# status: RUNNING
# message: ''
# deployment_timestamp: 1691109255.5476327
# deployment_statuses:
# - name: OpenAssistant--falcon-7b-sft-top1-696_OpenAssistant--falcon-7b-sft-top1-696
# status: HEALTHY
# message: ''
# ---
# name: router
# app_status:
# status: RUNNING
# message: ''
# deployment_timestamp: 1691109255.6641886
# deployment_statuses:
# - name: router_Router
# status: HEALTHY
# message: ''

# Step 7.4: List all models
export AVIARY_URL="http://localhost:8000"
aviary models
# proxies:
# e4dc8d29f19e3900c0b93dabb76ce9bcc6f42e36bdf5484ca57ec774: HEALTHY
# 4f4edf80bf644846175eec0a4daabb3f3775e64738720b6b2ae5c139: HEALTHY
# applications:
# router:
# status: RUNNING
# message: ''
# last_deployed_time_s: 1694808658.0861287
# deployments:
# Router:
# status: HEALTHY
# replica_states:
# RUNNING: 2
# message: ''
# meta-llama--Llama-2-7b-chat-hf:
# status: RUNNING
# message: ''
# last_deployed_time_s: 1694808658.0861287
# deployments:
# meta-llama--Llama-2-7b-chat-hf:
# status: HEALTHY
# replica_states:
# RUNNING: 1
# message: ''

# Step 7.4: Check the live Serve app's config
serve config

# [Example output]
# Connecting to Aviary backend at: http://localhost:8000/
# OpenAssistant/falcon-7b-sft-top1-696

# Step 7.5: Send a query to `OpenAssistant/falcon-7b-sft-top1-696`.
aviary query --model OpenAssistant/falcon-7b-sft-top1-696 --prompt "What are the top 5 most popular programming languages?"
# name: router
# route_prefix: /
# import_path: aviary.backend:router_application
# args:
# models:
# meta-llama/Llama-2-7b-chat-hf: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml

# [Example output for `OpenAssistant/falcon-7b-sft-top1-696`]
# Connecting to Aviary backend at: http://localhost:8000/v1
# OpenAssistant/falcon-7b-sft-top1-696:
# The top five most popular programming languages globally, according to TIOBE, are Java, Python, C, C++, and JavaScript. However, popularity can vary by region, industry, and
# other factors. Additionally, the definition of a programming language can vary, leading to different rankings depending on the methodology used. Some rankings may include or
# exclude specific scripting languages or high-level language variants, for example.
# ---

# Here are some additional rankings of the most popular programming languages:
# * **Top 10 programming languages in 2023**: Python, JavaScript, C#, Java, PHP, TypeScript, Swift, Golang, Ruby, and Kotlin.
# [Source](https://www.toptal.com/software/programming-languages/2023-best-programming-languages/)
# * **Top 10 programming languages in 2022**: Python, JavaScript, Java, C++, C#, PHP, Swift, Kotlin, R, and TypeScript.
# [Source](https://www.toptal.com/software/programming-languages/2022-best-programming-languages/)
# * **Top 10 programming languages in 2021**: Python, JavaScript, Java, C++, C#, PHP, Swift, Go, Kotlin, and TypeScript.
# .....
# These rankings can change frequently, so it's important to keep up to date with the latest trends.
# name: meta-llama--Llama-2-7b-chat-hf
# route_prefix: /meta-llama--Llama-2-7b-chat-hf
# import_path: aviary.backend:llm_application
# args:
# model: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml

# Step 7.5: Send a query to `meta-llama/Llama-2-7b-chat-hf`.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the top 5 most popular programming languages?"}
],
"temperature": 0.7
}'

# [Example output for `meta-llama/Llama-2-7b-chat-hf`]
{
"id":"meta-llama/Llama-2-7b-chat-hf-95239f0b-4601-4557-8a33-3977e9b6b779",
"object":"text_completion","created":1694814804,"model":"meta-llama/Llama-2-7b-chat-hf",
"choices":[
{
"message":
{
"role":"assistant",
"content":"As a helpful assistant, I'm glad to provide you with the top 5 most popular programming languages based on various sources and metrics:\n\n1. Java: Java is a popular language used for developing enterprise-level applications, Android apps, and web applications. It's known for its platform independence, which allows Java developers to create applications that can run on any device supporting the Java Virtual Machine (JVM).\n\n2. Python: Python is a versatile language that's widely used in various industries, including web development, data science, artificial intelligence, and machine learning. Its simplicity, readability, and ease of use make it a favorite among developers.\n\n3. JavaScript: JavaScript is the language of the web and is used for creating interactive client-side functionality for web applications. It's also used in mobile app development, game development, and server-side programming.\n\n4. C++: C++ is a high-performance language used for developing operating systems, games, and other high-performance applications. It's known for its efficiency, speed, and flexibility, making it a popular choice among developers.\n\n5. PHP: PHP is a server-side scripting language used for web development, especially for building dynamic websites and web applications. It's known for its ease of use and is widely used in the web development community.\n\nThese are the top 5 most popular programming languages based on various sources, but it's worth noting that programming language popularity can vary depending on the source and the time frame considered."
},
"index":0,
"finish_reason":"stop"
}
],
"usage":{
"prompt_tokens":39,
"completion_tokens":330,
"total_tokens":369
}
}
```

# Part 4: Deploy Aviary on a RayService (recommended for production)
# Part 4: Deploy RayLLM on a RayService (recommended for production)

## Step 1: Create a RayService with Aviary
## Step 1: Create a RayService with RayLLM

```sh
# path: docs/kuberay
Expand Down Expand Up @@ -229,43 +258,56 @@ serveConfigV2: |
OpenAssistant/falcon-7b-sft-top1-696: ./models/continuous_batching/OpenAssistant--falcon-7b-sft-top1-696.yaml
```
In the YAML file, we use the `serveConfigV2` field to configure two LLM serve applications, one for LightGPT and one for Falcon-7B.
In the YAML file, we use the `serveConfigV2` field to configure two LLM Serve applications, one for LightGPT and one for Falcon-7B.
It's important to note that the `model` argument refers to the path of the LLM model's YAML file, located in the Ray head Pod.

## Step 2: Send a query to both `amazon/LightGPT` and `OpenAssistant/falcon-7b-sft-top1-696`
## Step 2: Send a query to both `amazon/LightGPT` and `OpenAssistant/falcon-7b-sft-top1-696`.

```sh
# Step 2.1: Port forward the Kubernetes serve service.
# Note that the service will be created only when all serve applications are ready.
# Step 2.1: Port forward the Kubernetes Serve service.
# Note that the service will be created only when all Serve applications are ready.
kubectl get svc # Check if `aviary-serve-svc` is created.
kubectl port-forward service/aviary-serve-svc 8000:8000

# Step 2.2: Install the Aviary client if not already installed.
pip install "aviary @ git+https://github.com/ray-project/aviary.git"

# Step 2.3: List models via the Aviary CLI outside the Kubernetes cluster.
export AVIARY_URL="http://localhost:8000"
aviary models
# Step 2.2: Check that the models have started running using `serve status`
serve status

# [Example output]
# Connecting to Aviary backend at: http://localhost:8000/v1
# OpenAssistant/falcon-7b-sft-top1-696
# amazon/LightGPT

# Step 2.4: Send a query to `amazon/LightGPT`.
aviary query --model amazon/LightGPT --prompt "What are the top 5 most popular programming languages?"
# Step 2.3: Send a query to `amazon/LightGPT`.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "amazon/LightGPT",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the top 5 most popular programming languages?"}
],
"temperature": 0.7
}'

# [Example output]
# Connecting to Aviary backend at: http://localhost:8000/v1
# amazon/LightGPT:
# 1. Java
# 2. C++
# 3. JavaScript
# 4. Python
# 5. SQL

# Step 2.5: Send a query to `OpenAssistant/falcon-7b-sft-top1-696`.
aviary query --model OpenAssistant/falcon-7b-sft-top1-696 --prompt "What are the top 5 most popular programming languages?"
# Step 2.4: Send a query to `OpenAssistant/falcon-7b-sft-top1-696`.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "OpenAssistant/falcon-7b-sft-top1-696",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the top 5 most popular programming languages?"}
],
"temperature": 0.7
}'

# [Example output for `OpenAssistant/falcon-7b-sft-top1-696`]
# Connecting to Aviary backend at: http://localhost:8000/v1
Expand All @@ -282,20 +324,19 @@ aviary query --model OpenAssistant/falcon-7b-sft-top1-696 --prompt "What are the
# * **Top 10 programming languages in 2021**: Python, JavaScript, Java, C++, C#, PHP, Swift, Go, Kotlin, and TypeScript.
# .....
# These rankings can change frequently, so it's important to keep up to date with the latest trends.

# Step 2.6: Send a query to `OpenAssistant/falcon-7b-sft-top1-696` and get streaming response.
aviary stream --model OpenAssistant/falcon-7b-sft-top1-696 --prompt "What are the top 5 most popular programming languages?"
```

Check out the RayLLM README to learn more ways to query models, such as with the Python `requests` library or the OpenAI package. Use these techniques to stream responses from the models.

# Part 5: Clean up resources

**Warning: GPU nodes are extremely expensive. Please remember to delete the cluster if you no longer need it.**

```sh
# path: docs/kuberay
# Case 1: Aviary was deployed on a RayCluster
# Case 1: RayLLM was deployed on a RayCluster
kubectl delete -f ray-cluster.aviary-eks.yaml
# Case 2: Aviary was deployed as a RayService
# Case 2: RayLLM was deployed as a RayService
kubectl delete -f ray-service.aviary-eks.yaml

# Uninstall the KubeRay operator chart
Expand Down
Loading

0 comments on commit 5d220c6

Please sign in to comment.