This is a more detailed walkthrough covering how to create the multi-node Ray cluster and run the code for our blog post.
The full Ray cluster launcher documentation can be found here.
Install Ray locally: pip install 'ray[default]'
Clone the repository git clone https://github.com/ray-project/langchain-ray/
and switch into the directory
cd langchain-ray
.
You can edit the cluster yaml file if you need to make changes.
Setup the necessary AWS credentials (set the AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_SESSION_TOKEN
environment variables).
Then, you can start a Ray cluster via this YAML file: ray up -y llm-batch-inference.yaml
You can connect via the remote Ray dashboard: ray dashboard llm-batch-inference.yaml
.
This will setup the necessary port forwarding.
The dashboard can be viewed by visiting http://localhost:8265
You can view the progress of the worker node startup by viewing the autoscaler status on the Ray dashboard
Copy the requirements.txt file and the Ray batch inference code to the Ray cluster:
ray rsync_up llm-batch-inference.yaml 'embedding_ray.py' 'embedding_ray.py'
ray rsync_up llm-batch-inference.yaml 'requirements.txt' 'requirements.txt'
In a separate window, SSH into the Ray cluster via ray attach llm-batch-inference.yaml
Install the requirements on the head node of the cluster
pip install -r requirements.txt
Once all the worker nodes have started, run the Ray batch inference code on the cluster!
python embedding_ray.py
After the workload finished, tear down the cluster! This needs to be run from your laptop, so if you are still in the cluster shell, make sure to exit with Control-C.
ray down llm-batch-inference.yaml