Skip to content

A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve

License

Notifications You must be signed in to change notification settings

marwan116/raycraft

Repository files navigation

RayCraft

Motivation

FastAPI + Ray = <3

Let's take a FastAPI app and supercharge it with raycraft

from fastapi import FastAPI

simple_service = FastAPI()

@simple_service.post("/")
async def read_root() -> dict[str, str]:
    return {"Hello": "World"}

You can now run it using raycraft using the RayCraftAPI instead of FastAPI with only two lines of code changes

+ from raycraft import RayCraftAPI

+ simple_service = RayCraftAPI()

@simple_service.post("/")
async def read_root() -> dict[str, str]:
    return {"Hello": "World"}

How to use

Basic example

Ok so an endpoint returning {"Hello": "World"} isn't going to be enough to serve as a basic example so let's try something more interesting and relevant to why you might want to use raycraft!

Let's say you build a translation service using the following fastAPI code:

from fastapi import FastAPI
from transformers import pipeline

app = FastAPI()

def load_model():
    return pipeline("translation_en_to_fr", model="t5-small")

@app.post("/")
async def translate(text: str):
    model = load_model()
    translated = model(text)[0]["translation_text"]
    return {"translation": translated}

We can now build this app using raycraft with the same two lines of code changes

from raycraft import RayCraftAPI
from transformers import pipeline

app = RayCraftAPI()

def load_model():
    return pipeline("translation_en_to_fr", model="t5-small")

def translate(text: str):
    model = load_model()
    translated = model(text)[0]["translation_text"]
    return translated

@app.post("/")
async def translate(text: str):
    return translate(text)    

We then call the following command to run the app:

raycraft run demo:app

Ok now for the distributed part, let's say we want to run this app on 2 "replicas", each "replica" taking half a GPU, and we want to properly load balance between the replicas, we can do this by running the following command:

from raycraft import RayCraftAPI
from transformers import pipeline

app = RayCraftAPI(ray_actor_options={"num_gpus": 0.5}, num_replicas=2)

def load_model():
    return pipeline("translation_en_to_fr", model="t5-small")

def translate(text: str):
    model = load_model()
    translated = model(text)[0]["translation_text"]
    return translated

@app.post("/")
async def translate(text: str):
    return translate(text)    

To avoid loading the model on every request, we can load the model in the constructor of the app:

from raycraft import RayCraftAPI, App
from transformers import pipeline

app = RayCraftAPI(ray_actor_options={"num_gpus": 0.5}, num_replicas=2)

@app.init
def model():
    return pipeline("translation_en_to_fr", model="t5-small")

def translate(app: App, text: str):
    translated = app.model(text)[0]["translation_text"]
    return translated

@app.post("/")
async def translate(app: App, text: str):
    return translate(app, text) 

RayCraft is a thin-layer built on top of Ray Serve adopting a functional interface to ease the migration from fastAPI apps.

With Ray Serve, you can now:

  • Scale your app deployment to multiple replicas running on different machines
  • Define the resources allocated to each replica including fractional GPUs
  • Batch requests together to improve throughput
  • Get fault tolerance and automatic retries
  • Stream responses using websockets
  • Compose different services together using RPC calls that are strictly typed and faster than http requests

Composing models

How to setup

Using poetry:

poetry add raycraft

Using pip:

pip install raycraft

Roadmap

  • Streaming support using websockets
  • Deployment guide

About

A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages