Skip to content

maharshi95/hf-fastapi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hf-fastapi

This repo provides the FAST API server code for hosting huggingface models on local machine or on a cluster.

Installation

git clone https://github.com/maharshi95/hf-fastapi.git
cd hf-fastapi
bash setup_env.sh

Usage

Running the server from the host machine
conda activate hf-fastapi
python -m hf_fastapi.serve --model-name {MODEL_NAME} --port {PORT}
Submitting a SLURM job to run the server on a cluster
conda activate hf-fastapi
slaunch --exp-name="hf-serve" --config="slurm_configs/med_gpu_nexus.json" \
    hf_fastapi/serve.py -m "mistral-7b-inst" -p 8000

You can add a custom SLURM config file to the slurm_configs directory and use it to submit the job. An example of a SLURM config file is given below:

{
    "account": "$SLURM_ACCOUNT",
    "partition": "$SLURM_PARTITION",
    "qos": "default",
    "gres": "gpu:rtxa5000:1",
    "time": "10:00:00",
    "mem": "30G",
    "ntasks-per-node": 1,
    "cpus-per-task": 4
}

Client API

client/example.py contains an example of how to use the API.

from hf_client.client import HFClient
client = HFClient(host=HOST, port=PORT)

# Health check
resp = client.get_heartbeat()
print("Is alive?", resp.is_alive)

# Generate API
prompt = "Question: What is the meaning of life, the universe, and everything? Answer:"
resp = client.generate(prompt=prompt, max_new_tokens=50)
print(f'Input: "{resp.input_text}"')
print("Model:", resp.model_name)
print(f'Output: "{resp.generated_text.strip()}"')

About

A repo to serve huggingface models using FAST API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published