Here we describe the main caveats of using KubeRay for deploying and running Ray Serve. It is mostly based on this documentation
Install Kuberay operator following documentation
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v0.6.0&timeout=90s"
Unfortunately usage of Ray serve requires a bit of specific cluster configuration. The example of of such configuration is here. The most important Serve specific things there are:
- Line 20 - defining
dashboard-agent-listen-port
, that determines the port for serve management APIs - Lines 51-52 defining ports for dashboard agent.
With this in place, we can use dashboard-agent-listen-port
for accessing serve APIs. We can either
port-forward
or create additional route for accessing it.
We have here to serve examples - hello and fruit borrowed from Ray documentation
Once the code is created, we need to:
- Package it to the docker image
- Create deployment config file using
serve build
command
For our example the commands look like follows:
serve build hello:graph -o hello.yaml
serve build fruit:deployment_graph -o fruit.yaml
These 2 commands will produce yaml files here and here The base yaml files presented here can be further enchanced based on documentation. Most common overrides include number of replicas, and deployment parameters.
Once yaml files are in place we can use serve deploy to deploy them. Serve deploy is a thin wrapper over HTTP APIs, that can be used directly. Definitions of the Rest APIs can be found here
For our example we first do port-forward:
kubectl port-forward svc/raycluster-heterogeneous-head-svc 52365 -n max
And then use the following commands:
serve deploy hello.yaml
serve deploy fruit.yaml
The newer Rest APIs allow for supporting serve applications and Allow to deploy both serve applications
Once the application is installed you can also see configuration in the Ray dashboard
Following this, do port-forward:
kubectl port-forward svc/raycluster-heterogeneous-head-svc 8000 -n max
And then use this command:
curl -H "Content-Type: application/json" -d '["PEAR", 2]' "http://localhost:8000/"
curl "http://localhost:8000/?name=Ray"
The only command is:
serve shutdown
Following documentation Ray now supports deploying multiple independent Serve applications.
To try this, we first need to modify fruit and hello to ensure that they are listening on different URLs fruit_url and hello_url
Once this is done, the following command generates deployment yaml:
serve build --multi-app fruit_url:graph hello_url:graph -o multi_app.yaml
The auto-generated application names default to app1
, app2
, so I changed them in generated yaml.
Finally we need to add newly created python files fruit_url and hello_url to the docker
file and rebuild our image.
When this is done and cluster is restarted, we can deploy our applications as follows:
serve deploy multi_app.yaml
Alternatively we can deploy using HTTP:
curl -X PUT http://localhost:52365/api/serve/applications/ -H 'Content-Type: application/json' -d '{"proxy_location": "EveryNode", "http_options": {"host": "0.0.0.0", "port": 8000},
"applications": [{"name": "fruit", "route_prefix": "/fruit", "import_path": "fruit_url:graph",
"runtime_env": {},
"deployments": [{"name": "MangoStand", "user_config": {"price": 3}},
{"name": "OrangeStand", "user_config": {"price": 2}},
{"name": "PearStand", "user_config": {"price": 4}},
{"name": "FruitMarket", "num_replicas": 2},
{"name": "DAGDriver"}]},
{"name": "greet", "route_prefix": "/greet", "import_path": "hello_url:graph",
"runtime_env": {},
"deployments": [{"name": "Doubler"},
{"name": "HelloDeployment"},
{"name": "DAGDriver"}]}]}'
Once deployment is completed, you can port forward:
kubectl port-forward svc/raycluster-heterogeneous-head-svc 8000 -n max
and run:
curl "http://localhost:8000/greet/?name=Ray"
curl -H "Content-Type: application/json" -d '["PEAR", 2]' "http://localhost:8000/fruit/"
Alternatively you can use POST. Also for curl, note a tip
In addition to port-forward, you can create a route exposing port 8000 and using it for invocation.