Merge pull request #79 from avnishn/0.4.0

0.4.0 release The following changes are introduced: Renaming aviary to rayllm. Support for reading models from gcs in addition to aws s3. Increased testing for prompting. New model configs for Falcon 7B and 40B. Make frontend compatible with Ray Serve 2.7 Co-authored-by: Avnish Narayan <[email protected]> Co-authored-by: Chris Sivanich <[email protected]> Co-authored-by: Tanmay Chordia <[email protected]> Co-authored-by: Sihan Wang <[email protected]> Co-authored-by: Shreyas Krishnaswamy <[email protected]> Co-authored-by: Richard Liaw <[email protected]>
ray-project · Oct 28, 2023 · c2a22af · c2a22af
2 parents b3560aa + 83a54a1
commit c2a22af
Show file tree

Hide file tree

Showing 133 changed files with 1,159 additions and 403 deletions.
diff --git a/.gitignore b/.gitignore
@@ -232,6 +232,7 @@ tag-mapping.json
 *.tmp
 deploy/anyscale/service.yaml
 out
+temp.py
 
 # build output
 build/
@@ -248,3 +249,7 @@ prompts.txt
 site/
 
 *.orig
+
+__pycache__
+
+.secretenv.yml
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -19,7 +19,7 @@ repos:
     hooks:
       - id: mypy
         # NOTE: Exclusions are handled in pyproject.toml
-        files: aviary
+        files: rayllm
         exclude: tests
         additional_dependencies:
           - mypy-extensions

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,5 +1,5 @@
-include README.md README.ipynb LICENSE *.sh
+include README.md LICENSE *.sh
 recursive-include tests *.py
 recursive-include models *.yaml
 recursive-include examples *.*
-recursive-include aviary/frontend *.js
+recursive-include rayllm/frontend *.js
diff --git a/README.md b/README.md
@@ -32,14 +32,14 @@ The guide below walks you through the steps required for deployment of RayLLM on
 
 ### Locally
 
-We highly recommend using the official `anyscale/aviary` Docker image to run RayLLM. Manually installing RayLLM is currently not a supported use-case due to specific dependencies required, some of which are not available on pip.
+We highly recommend using the official `anyscale/rayllm` Docker image to run RayLLM. Manually installing RayLLM is currently not a supported use-case due to specific dependencies required, some of which are not available on pip.
 
 ```shell
 cache_dir=${XDG_CACHE_HOME:-$HOME/.cache}
 
-docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/aviary:latest bash
+docker run -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:~/data anyscale/rayllm:latest bash
 # Inside docker container
-aviary run --model ~/models/continuous_batching/amazon--LightGPT.yaml
+serve run ~/serve_config/amazon--LightGPT.yaml
 ```
 
 ### On a Ray Cluster
@@ -57,7 +57,7 @@ export AWS_SESSION_TOKEN=...
 
 Start by cloning this repo to your local machine.
 
-You may need to specify your AWS private key in the `deploy/ray/aviary-cluster.yaml` file.
+You may need to specify your AWS private key in the `deploy/ray/rayllm-cluster.yaml` file.
 See [Ray on Cloud VMs](https://docs.ray.io/en/latest/cluster/vms/index.html) page in
 Ray documentation for more details.
 
@@ -66,14 +66,14 @@ git clone https://github.com/ray-project/ray-llm.git
 cd ray-llm
 
 # Start a Ray Cluster (This will take a few minutes to start-up)
-ray up deploy/ray/aviary-cluster.yaml
+ray up deploy/ray/rayllm-cluster.yaml
 ```
 
 #### Connect to your Cluster
 
 ```shell
 # Connect to the Head node of your Ray Cluster (This will take several minutes to autoscale)
-ray attach deploy/ray/aviary-cluster.yaml
+ray attach deploy/ray/rayllm-cluster.yaml
 
 # Deploy the LightGPT model. 
 serve run serve_configs/amazon--LightGPT.yaml
@@ -91,7 +91,7 @@ For Kubernetes deployments, please see our documentation for [deploying on KubeR
 Once the models are deployed, you can install a client outside of the Docker container to query the backend.
 
 ```shell
-pip install "aviary @ git+https://github.com/ray-project/ray-llm.git"
+pip install "rayllm @ git+https://github.com/ray-project/ray-llm.git"
 ```
 
 You can query your RayLLM deployment in many ways.
@@ -219,19 +219,19 @@ print(chat_completion)
 To install RayLLM and its dependencies, run the following command:
 
 ```shell
-pip install "aviary @ git+https://github.com/ray-project/ray-llm.git"
+pip install "rayllm @ git+https://github.com/ray-project/ray-llm.git"
 ```
 
 RayLLM consists of a set of configurations and utilities for deploying LLMs on Ray Serve,
 in addition to a frontend (Aviary Explorer), both of which come with additional
 dependencies. To install the dependencies for the frontend run the following commands:
 
 ```shell
-pip install "aviary[frontend] @ git+https://github.com/ray-project/ray-llm.git"
+pip install "rayllm[frontend] @ git+https://github.com/ray-project/ray-llm.git"
 ```
 
 The backend dependencies are heavy weight, and quite large. We recommend using the official
-`anyscale/aviary` image. Installing the backend manually is not a supported usecase.
+`anyscale/rayllm` image. Installing the backend manually is not a supported usecase.
 
 ## Running Aviary Explorer locally
 
@@ -307,7 +307,7 @@ Run multiple models at once by aggregating the Serve configs for different model
 
 applications:
 - name: router
-  import_path: aviary.backend:router_application
+  import_path: rayllm.backend:router_application
   route_prefix: /
   args:
     models:
@@ -368,4 +368,4 @@ Feel free to post an issue first to get our feedback on a proposal first, or jus
 
 We use `pre-commit` hooks to ensure that all code is formatted correctly.
 Make sure to `pip install pre-commit` and then run `pre-commit install`.
-You can also run `./format` to run the hooks manually.
+You can also run `./format` to run the hooks manually.
diff --git a/aviary/__init__.py b/aviary/__init__.py
diff --git a/aviary/backend/__init__.py b/aviary/backend/__init__.py
diff --git a/aviary/backend/observability/tracing/__init__.py b/aviary/backend/observability/tracing/__init__.py
diff --git a/build_aviary_wheel.sh → build_rayllm_wheel.sh b/build_aviary_wheel.sh → build_rayllm_wheel.sh
diff --git a/deploy.sh b/deploy.sh
diff --git a/deploy/ray/backend.yaml b/deploy/ray/backend.yaml
@@ -1,10 +1,10 @@
-import_path: aviary.backend:llm_application
+import_path: rayllm.backend:llm_application
 runtime_env:
   # This working dir is relative to the working dir when we run this file
   working_dir: "."
   excludes: 
   - "deploy"
-  - "aviary/frontend"
+  - "rayllm/frontend"
 args:
   models:
   # This can be a path to a model configuration directory or yaml

diff --git a/deploy/ray/aviary-cluster.yaml → deploy/ray/rayllm-cluster.yaml b/deploy/ray/aviary-cluster.yaml → deploy/ray/rayllm-cluster.yaml
@@ -1,14 +1,14 @@
 # An unique identifier for the head node and workers of this cluster.
-cluster_name: aviary-deploy
+cluster_name: rayllm-deploy
 
 # Cloud-provider specific configuration.
 provider:
     type: aws
     region: us-west-2
     cache_stopped_nodes: False
 docker:
-    image: "anyscale/aviary:test"
-    container_name: "aviary"
+    image: "anyscale/rayllm:latest"
+    container_name: "rayllm"
     run_options:
       - --entrypoint ""
 

diff --git a/docs/DOCKERHUB.md b/docs/DOCKERHUB.md
@@ -0,0 +1,51 @@
+<!---
+Docker Hub Description File
+-->
+
+# Overview
+
+This is the publicly available set of Docker images for Anyscale/Ray's RayLLM (formerly Aviary) project.
+
+RayLLM is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs. It does this by:
+
+- Providing an extensive suite of pre-configured open source LLMs, with defaults that work out of the box.
+- Supporting Transformer models hosted on Hugging Face Hub or present on local disk.
+- Simplifying the deployment of multiple LLMs within a single unified framework.
+- Simplifying the addition of new LLMs to within minutes in most cases.
+- Offering unique autoscaling support, including scale-to-zero.
+- Fully supporting multi-GPU & multi-node model deployments.
+- Offering high performance features like continuous batching, quantization and streaming.
+- Providing a REST API that is similar to OpenAI's to make it easy to migrate and cross test them.
+
+[Read more here](https://github.com/ray-project/ray-llm)
+
+## Tags
+
+| Name | Notes |
+|----|----|
+| [`:0.3.1`](https://hub.docker.com/layers/anyscale/ray-llm/0.3.1/images/sha256-0dad10786076e18530fbd8016929ab9b240c8fe12163d5e74d8784ff1cbf5fb4) | Release v0.3.1 |
+| [`:0.3.0`](https://hub.docker.com/layers/anyscale/ray-llm/0.3.0/images/sha256-310df8d6bfcce49fa00c0040f090099b7d376ed9535df85fa4147e7c159e7e90) | Release v0.3.0 |
+| `:latest` | Most recently pushed version release image |
+
+## Usage
+
+See: [ray-project/ray-llm "Deploying RayLLM"](https://github.com/ray-project/ray-llm#deploying-rayllm) for full instructions
+
+### Example
+
+Requires a machine with compatible NVIDIA A10 GPU and valid `HUGGING_FACE_HUB_TOKEN` to run the [Amazon LightGPT model](https://huggingface.co/amazon/LightGPT):
+
+```sh
+docker run \
+    --gpus all \
+    -e HUGGING_FACE_HUB_TOKEN=<your_token> \
+    --shm-size 1g \
+    -p 8000:8000 \
+    --entrypoint rayllm \
+    anyscale/rayllm:latest run --model models/continuous_batching/amazon--LightGPT.yaml
+```
+
+# Source
+
+Source is available at https://github.com/ray-project/ray-llm
+
diff --git a/docs/kuberay/deploy-on-eks.md b/docs/kuberay/deploy-on-eks.md
@@ -98,7 +98,7 @@ The second option is recommended for production use due to the additional high a
 
 ```sh
 # path: docs/kuberay
-kubectl apply -f ray-cluster.aviary-eks.yaml
+kubectl apply -f ray-cluster.rayllm-eks.yaml
 ```
 
 Something is worth noticing:
@@ -178,7 +178,7 @@ serve config
 # [Example output]
 # name: router
 # route_prefix: /
-# import_path: aviary.backend:router_application
+# import_path: rayllm.backend:router_application
 # args:
 #   models:
 #     meta-llama/Llama-2-7b-chat-hf: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
@@ -187,7 +187,7 @@ serve config
 
 # name: meta-llama--Llama-2-7b-chat-hf
 # route_prefix: /meta-llama--Llama-2-7b-chat-hf
-# import_path: aviary.backend:llm_application
+# import_path: rayllm.backend:llm_application
 # args:
 #   model: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
 
@@ -232,10 +232,10 @@ curl http://localhost:8000/v1/chat/completions \
 
 ```sh
 # path: docs/kuberay
-kubectl apply -f ray-service.aviary-eks.yaml
+kubectl apply -f ray-service.rayllm-eks.yaml
 ```
 
-The `spec.rayClusterConfig` in `ray-service.aviary-eks.yaml` is the same as the `spec` in `ray-cluster.aviary-eks.yaml`.
+The `spec.rayClusterConfig` in `ray-service.rayllm-eks.yaml` is the same as the `spec` in `ray-cluster.rayllm-eks.yaml`.
 The only difference lies in the `serve` port, which is required for both the Ray head and Ray worker Pods in the case of RayService.
 Hence, you can refer to Part 3 for more details about how to configure the RayCluster.
 
@@ -246,7 +246,7 @@ If this process takes longer, follow the instructions in [the RayService trouble
 serveConfigV2: |
     applications:
     - name: router
-      import_path: aviary.backend:router_application
+      import_path: rayllm.backend:router_application
       route_prefix: /
       args:
         models:
@@ -331,12 +331,12 @@ Check out the RayLLM README to learn more ways to query models, such as with the
 ```sh
 # path: docs/kuberay
 # Case 1: RayLLM was deployed on a RayCluster
-kubectl delete -f ray-cluster.aviary-eks.yaml
+kubectl delete -f ray-cluster.rayllm-eks.yaml
 # Case 2: RayLLM was deployed as a RayService
-kubectl delete -f ray-service.aviary-eks.yaml
+kubectl delete -f ray-service.rayllm-eks.yaml
 
 # Uninstall the KubeRay operator chart
 helm uninstall kuberay-operator
 
 # Delete the Amazon EKS cluster via AWS Web UI
-```
+```
diff --git a/docs/kuberay/deploy-on-gke.md b/docs/kuberay/deploy-on-gke.md
@@ -57,7 +57,7 @@ tolerations:
   effect: NoSchedule
 ```
 
-This toleration has already been added to the RayCluster YAML manifest `ray-cluster.aviary-gke.yaml` used in Step 6.
+This toleration has already been added to the RayCluster YAML manifest `ray-cluster.rayllm-gke.yaml` used in Step 6.
 
 For more on taints and tolerations, see the [Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
 
@@ -100,13 +100,13 @@ helm repo update
 
 ## Step 6: Create a RayCluster with RayLLM
 
-If you are running this tutorial on the Google Cloud Shell, please copy the file `docs/kuberay/ray-cluster.aviary-gke.yaml` to the Google Cloud Shell. You may find it useful to use the [Cloud Shell Editor](https://cloud.google.com/shell/docs/editor-overview) to edit the file.
+If you are running this tutorial on the Google Cloud Shell, please copy the file `docs/kuberay/ray-cluster.rayllm-gke.yaml` to the Google Cloud Shell. You may find it useful to use the [Cloud Shell Editor](https://cloud.google.com/shell/docs/editor-overview) to edit the file.
 
-Now you can create a RayCluster with RayLLM. RayLLM is included in the image `anyscale/aviary:latest`, which is specified in the RayCluster YAML manifest `ray-cluster.aviary-gke.yaml`.
+Now you can create a RayCluster with RayLLM. RayLLM is included in the image `anyscale/rayllm:latest`, which is specified in the RayCluster YAML manifest `ray-cluster.rayllm-gke.yaml`.
 
 ```sh
 # path: docs/kuberay
-kubectl apply -f ray-cluster.aviary-gke.yaml
+kubectl apply -f ray-cluster.rayllm-gke.yaml
 ```
 
 Note the following aspects of the YAML file:
@@ -179,7 +179,7 @@ serve config
 # [Example output]
 # name: router
 # route_prefix: /
-# import_path: aviary.backend:router_application
+# import_path: rayllm.backend:router_application
 # args:
 #   models:
 #     meta-llama/Llama-2-7b-chat-hf: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
@@ -188,7 +188,7 @@ serve config
 
 # name: meta-llama--Llama-2-7b-chat-hf
 # route_prefix: /meta-llama--Llama-2-7b-chat-hf
-# import_path: aviary.backend:llm_application
+# import_path: rayllm.backend:llm_application
 # args:
 #   model: ./models/continuous_batching/meta-llama--Llama-2-7b-chat-hf.yaml
 
@@ -234,7 +234,7 @@ curl http://localhost:8000/v1/chat/completions \
 ```sh
 # Step 8.1: Delete the RayCluster
 # path: docs/kuberay
-kubectl delete -f ray-cluster.aviary-gke.yaml
+kubectl delete -f ray-cluster.rayllm-gke.yaml
 
 # Step 8.2: Uninstall the KubeRay operator chart
 helm uninstall kuberay-operator
@@ -243,4 +243,4 @@ helm uninstall kuberay-operator
 gcloud container clusters delete rayllm-gpu-cluster
 ```
 
-See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/deleting-a-cluster) for more details on deleting a GKE cluster.
+See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/deleting-a-cluster) for more details on deleting a GKE cluster.
diff --git a/docs/kuberay/ray-cluster.aviary-eks.yaml → docs/kuberay/ray-cluster.rayllm-eks.yaml b/docs/kuberay/ray-cluster.aviary-eks.yaml → docs/kuberay/ray-cluster.rayllm-eks.yaml
@@ -1,7 +1,7 @@
 apiVersion: ray.io/v1alpha1
 kind: RayCluster
 metadata:
-  name: aviary
+  name: rayllm
 spec:
   # Ray head pod template
   headGroupSpec:
@@ -16,7 +16,7 @@ spec:
       spec:
         containers:
         - name: ray-head
-          image: anyscale/aviary:latest
+          image: anyscale/rayllm:latest
           resources:
             limits:
               cpu: 2
@@ -45,7 +45,7 @@ spec:
       spec:
         containers:
         - name: llm
-          image: anyscale/aviary:latest
+          image: anyscale/rayllm:latest
           lifecycle:
             preStop:
               exec: