Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Commit

Permalink
Merge pull request #79 from avnishn/0.4.0
Browse files Browse the repository at this point in the history
0.4.0 release

The following changes are introduced:

Renaming aviary to rayllm.
Support for reading models from gcs in addition to aws s3.
Increased testing for prompting.
New model configs for Falcon 7B and 40B.
Make frontend compatible with Ray Serve 2.7


Co-authored-by: Avnish Narayan <[email protected]>
Co-authored-by: Chris Sivanich <[email protected]>
Co-authored-by: Tanmay Chordia <[email protected]>
Co-authored-by: Sihan Wang <[email protected]>
Co-authored-by: Shreyas Krishnaswamy <[email protected]>
Co-authored-by: Richard Liaw <[email protected]>
  • Loading branch information
7 people authored Oct 28, 2023
2 parents b3560aa + 83a54a1 commit c2a22af
Show file tree
Hide file tree
Showing 133 changed files with 1,159 additions and 403 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ tag-mapping.json
*.tmp
deploy/anyscale/service.yaml
out
temp.py

# build output
build/
Expand All @@ -248,3 +249,7 @@ prompts.txt
site/

*.orig

__pycache__

.secretenv.yml
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ repos:
hooks:
- id: mypy
# NOTE: Exclusions are handled in pyproject.toml
files: aviary
files: rayllm
exclude: tests
additional_dependencies:
- mypy-extensions
Expand Down
4 changes: 2 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
include README.md README.ipynb LICENSE *.sh
include README.md LICENSE *.sh
recursive-include tests *.py
recursive-include models *.yaml
recursive-include examples *.*
recursive-include aviary/frontend *.js
recursive-include rayllm/frontend *.js
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,14 @@ The guide below walks you through the steps required for deployment of RayLLM on

### Locally

We highly recommend using the official `anyscale/aviary` Docker image to run RayLLM. Manually installing RayLLM is currently not a supported use-case due to specific dependencies required, some of which are not available on pip.
We highly recommend using the official `anyscale/rayllm` Docker image to run RayLLM. Manually installing RayLLM is currently not a supported use-case due to specific dependencies required, some of which are not available on pip.

```shell
cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}

docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/aviary:latest bash
docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/rayllm:latest bash
# Inside docker container
aviary run --model ~/models/continuous_batching/amazon--LightGPT.yaml
serve run ~/serve_config/amazon--LightGPT.yaml
```

### On a Ray Cluster
Expand All @@ -57,7 +57,7 @@ export AWS_SESSION_TOKEN=...

Start by cloning this repo to your local machine.

You may need to specify your AWS private key in the `deploy/ray/aviary-cluster.yaml` file.
You may need to specify your AWS private key in the `deploy/ray/rayllm-cluster.yaml` file.
See [Ray on Cloud VMs](https://docs.ray.io/en/latest/cluster/vms/index.html) page in
Ray documentation for more details.

Expand All @@ -66,14 +66,14 @@ git clone https://github.com/ray-project/ray-llm.git
cd ray-llm

# Start a Ray Cluster (This will take a few minutes to start-up)
ray up deploy/ray/aviary-cluster.yaml
ray up deploy/ray/rayllm-cluster.yaml
```

#### Connect to your Cluster

```shell
# Connect to the Head node of your Ray Cluster (This will take several minutes to autoscale)
ray attach deploy/ray/aviary-cluster.yaml
ray attach deploy/ray/rayllm-cluster.yaml

# Deploy the LightGPT model.
serve run serve_configs/amazon--LightGPT.yaml
Expand All @@ -91,7 +91,7 @@ For Kubernetes deployments, please see our documentation for [deploying on KubeR
Once the models are deployed, you can install a client outside of the Docker container to query the backend.

```shell
pip install "aviary @ git+https://github.com/ray-project/ray-llm.git"
pip install "rayllm @ git+https://github.com/ray-project/ray-llm.git"
```

You can query your RayLLM deployment in many ways.
Expand Down Expand Up @@ -219,19 +219,19 @@ print(chat_completion)
To install RayLLM and its dependencies, run the following command:

```shell
pip install "aviary @ git+https://github.com/ray-project/ray-llm.git"
pip install "rayllm @ git+https://github.com/ray-project/ray-llm.git"
```

RayLLM consists of a set of configurations and utilities for deploying LLMs on Ray Serve,
in addition to a frontend (Aviary Explorer), both of which come with additional
dependencies. To install the dependencies for the frontend run the following commands:

```shell
pip install "aviary[frontend] @ git+https://github.com/ray-project/ray-llm.git"
pip install "rayllm[frontend] @ git+https://github.com/ray-project/ray-llm.git"
```

The backend dependencies are heavy weight, and quite large. We recommend using the official
`anyscale/aviary` image. Installing the backend manually is not a supported usecase.
`anyscale/rayllm` image. Installing the backend manually is not a supported usecase.

## Running Aviary Explorer locally

Expand Down Expand Up @@ -307,7 +307,7 @@ Run multiple models at once by aggregating the Serve configs for different model

applications:
- name: router
import_path: aviary.backend:router_application
import_path: rayllm.backend:router_application
route_prefix: /
args:
models:
Expand Down Expand Up @@ -368,4 +368,4 @@ Feel free to post an issue first to get our feedback on a proposal first, or jus

We use `pre-commit` hooks to ensure that all code is formatted correctly.
Make sure to `pip install pre-commit` and then run `pre-commit install`.
You can also run `./format` to run the hooks manually.
You can also run `./format` to run the hooks manually.
6 changes: 0 additions & 6 deletions aviary/__init__.py

This file was deleted.

4 changes: 0 additions & 4 deletions aviary/backend/__init__.py

This file was deleted.

3 changes: 0 additions & 3 deletions aviary/backend/observability/tracing/__init__.py

This file was deleted.

File renamed without changes.
2 changes: 0 additions & 2 deletions deploy.sh

This file was deleted.

4 changes: 2 additions & 2 deletions deploy/ray/backend.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import_path: aviary.backend:llm_application
import_path: rayllm.backend:llm_application
runtime_env:
# This working dir is relative to the working dir when we run this file
working_dir: "."
excludes:
- "deploy"
- "aviary/frontend"
- "rayllm/frontend"
args:
models:
# This can be a path to a model configuration directory or yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# An unique identifier for the head node and workers of this cluster.
cluster_name: aviary-deploy
cluster_name: rayllm-deploy

# Cloud-provider specific configuration.
provider:
type: aws
region: us-west-2
cache_stopped_nodes: False
docker:
image: "anyscale/aviary:test"
container_name: "aviary"
image: "anyscale/rayllm:latest"
container_name: "rayllm"
run_options:
- --entrypoint ""

Expand Down
51 changes: 51 additions & 0 deletions docs/DOCKERHUB.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!---
Docker Hub Description File
-->

# Overview

This is the publicly available set of Docker images for Anyscale/Ray's RayLLM (formerly Aviary) project.

RayLLM is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs. It does this by:

- Providing an extensive suite of pre-configured open source LLMs, with defaults that work out of the box.
- Supporting Transformer models hosted on Hugging Face Hub or present on local disk.
- Simplifying the deployment of multiple LLMs within a single unified framework.
- Simplifying the addition of new LLMs to within minutes in most cases.
- Offering unique autoscaling support, including scale-to-zero.
- Fully supporting multi-GPU & multi-node model deployments.
- Offering high performance features like continuous batching, quantization and streaming.
- Providing a REST API that is similar to OpenAI's to make it easy to migrate and cross test them.

[Read more here](https://github.com/ray-project/ray-llm)

## Tags

| Name | Notes |
|----|----|
| [`:0.3.1`](https://hub.docker.com/layers/anyscale/ray-llm/0.3.1/images/sha256-0dad10786076e18530fbd8016929ab9b240c8fe12163d5e74d8784ff1cbf5fb4) | Release v0.3.1 |
| [`:0.3.0`](https://hub.docker.com/layers/anyscale/ray-llm/0.3.0/images/sha256-310df8d6bfcce49fa00c0040f090099b7d376ed9535df85fa4147e7c159e7e90) | Release v0.3.0 |
| `:latest` | Most recently pushed version release image |

## Usage

See: [ray-project/ray-llm "Deploying RayLLM"](https://github.com/ray-project/ray-llm#deploying-rayllm) for full instructions

### Example

Requires a machine with compatible NVIDIA A10 GPU and valid `HUGGING_FACE_HUB_TOKEN` to run the [Amazon LightGPT model](https://huggingface.co/amazon/LightGPT):

```sh
docker run \
--gpus all \
-e HUGGING_FACE_HUB_TOKEN=<your_token> \
--shm-size 1g \
-p 8000:8000 \
--entrypoint rayllm \
anyscale/rayllm:latest run --model models/continuous_batching/amazon--LightGPT.yaml
```

# Source

Source is available at https://github.com/ray-project/ray-llm

18 changes: 9 additions & 9 deletions docs/kuberay/deploy-on-eks.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ The second option is recommended for production use due to the additional high a

```sh
# path: docs/kuberay
kubectl apply -f ray-cluster.aviary-eks.yaml
kubectl apply -f ray-cluster.rayllm-eks.yaml
```

Something is worth noticing:
Expand Down Expand Up @@ -178,7 +178,7 @@ serve config
# [Example output]
# name: router
# route_prefix: /
# import_path: aviary.backend:router_application
# import_path: rayllm.backend:router_application
# args:
# models:
# meta-llama/Llama-2-7b-chat-hf: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
Expand All @@ -187,7 +187,7 @@ serve config

# name: meta-llama--Llama-2-7b-chat-hf
# route_prefix: /meta-llama--Llama-2-7b-chat-hf
# import_path: aviary.backend:llm_application
# import_path: rayllm.backend:llm_application
# args:
# model: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml

Expand Down Expand Up @@ -232,10 +232,10 @@ curl http://localhost:8000/v1/chat/completions \

```sh
# path: docs/kuberay
kubectl apply -f ray-service.aviary-eks.yaml
kubectl apply -f ray-service.rayllm-eks.yaml
```

The `spec.rayClusterConfig` in `ray-service.aviary-eks.yaml` is the same as the `spec` in `ray-cluster.aviary-eks.yaml`.
The `spec.rayClusterConfig` in `ray-service.rayllm-eks.yaml` is the same as the `spec` in `ray-cluster.rayllm-eks.yaml`.
The only difference lies in the `serve` port, which is required for both the Ray head and Ray worker Pods in the case of RayService.
Hence, you can refer to Part 3 for more details about how to configure the RayCluster.

Expand All @@ -246,7 +246,7 @@ If this process takes longer, follow the instructions in [the RayService trouble
serveConfigV2: |
applications:
- name: router
import_path: aviary.backend:router_application
import_path: rayllm.backend:router_application
route_prefix: /
args:
models:
Expand Down Expand Up @@ -331,12 +331,12 @@ Check out the RayLLM README to learn more ways to query models, such as with the
```sh
# path: docs/kuberay
# Case 1: RayLLM was deployed on a RayCluster
kubectl delete -f ray-cluster.aviary-eks.yaml
kubectl delete -f ray-cluster.rayllm-eks.yaml
# Case 2: RayLLM was deployed as a RayService
kubectl delete -f ray-service.aviary-eks.yaml
kubectl delete -f ray-service.rayllm-eks.yaml

# Uninstall the KubeRay operator chart
helm uninstall kuberay-operator

# Delete the Amazon EKS cluster via AWS Web UI
```
```
16 changes: 8 additions & 8 deletions docs/kuberay/deploy-on-gke.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ tolerations:
effect: NoSchedule
```
This toleration has already been added to the RayCluster YAML manifest `ray-cluster.aviary-gke.yaml` used in Step 6.
This toleration has already been added to the RayCluster YAML manifest `ray-cluster.rayllm-gke.yaml` used in Step 6.

For more on taints and tolerations, see the [Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).

Expand Down Expand Up @@ -100,13 +100,13 @@ helm repo update

## Step 6: Create a RayCluster with RayLLM

If you are running this tutorial on the Google Cloud Shell, please copy the file `docs/kuberay/ray-cluster.aviary-gke.yaml` to the Google Cloud Shell. You may find it useful to use the [Cloud Shell Editor](https://cloud.google.com/shell/docs/editor-overview) to edit the file.
If you are running this tutorial on the Google Cloud Shell, please copy the file `docs/kuberay/ray-cluster.rayllm-gke.yaml` to the Google Cloud Shell. You may find it useful to use the [Cloud Shell Editor](https://cloud.google.com/shell/docs/editor-overview) to edit the file.

Now you can create a RayCluster with RayLLM. RayLLM is included in the image `anyscale/aviary:latest`, which is specified in the RayCluster YAML manifest `ray-cluster.aviary-gke.yaml`.
Now you can create a RayCluster with RayLLM. RayLLM is included in the image `anyscale/rayllm:latest`, which is specified in the RayCluster YAML manifest `ray-cluster.rayllm-gke.yaml`.

```sh
# path: docs/kuberay
kubectl apply -f ray-cluster.aviary-gke.yaml
kubectl apply -f ray-cluster.rayllm-gke.yaml
```

Note the following aspects of the YAML file:
Expand Down Expand Up @@ -179,7 +179,7 @@ serve config
# [Example output]
# name: router
# route_prefix: /
# import_path: aviary.backend:router_application
# import_path: rayllm.backend:router_application
# args:
# models:
# meta-llama/Llama-2-7b-chat-hf: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
Expand All @@ -188,7 +188,7 @@ serve config

# name: meta-llama--Llama-2-7b-chat-hf
# route_prefix: /meta-llama--Llama-2-7b-chat-hf
# import_path: aviary.backend:llm_application
# import_path: rayllm.backend:llm_application
# args:
# model: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml

Expand Down Expand Up @@ -234,7 +234,7 @@ curl http://localhost:8000/v1/chat/completions \
```sh
# Step 8.1: Delete the RayCluster
# path: docs/kuberay
kubectl delete -f ray-cluster.aviary-gke.yaml
kubectl delete -f ray-cluster.rayllm-gke.yaml

# Step 8.2: Uninstall the KubeRay operator chart
helm uninstall kuberay-operator
Expand All @@ -243,4 +243,4 @@ helm uninstall kuberay-operator
gcloud container clusters delete rayllm-gpu-cluster
```

See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/deleting-a-cluster) for more details on deleting a GKE cluster.
See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/deleting-a-cluster) for more details on deleting a GKE cluster.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
name: aviary
name: rayllm
spec:
# Ray head pod template
headGroupSpec:
Expand All @@ -16,7 +16,7 @@ spec:
spec:
containers:
- name: ray-head
image: anyscale/aviary:latest
image: anyscale/rayllm:latest
resources:
limits:
cpu: 2
Expand Down Expand Up @@ -45,7 +45,7 @@ spec:
spec:
containers:
- name: llm
image: anyscale/aviary:latest
image: anyscale/rayllm:latest
lifecycle:
preStop:
exec:
Expand Down
Loading

0 comments on commit c2a22af

Please sign in to comment.