hf-fastapi

This repo provides the FAST API server code for hosting huggingface models on local machine or on a cluster.

Installation

git clone https://github.com/maharshi95/hf-fastapi.git
cd hf-fastapi
bash setup_env.sh

Usage

Running the server from the host machine

conda activate hf-fastapi
python -m hf_fastapi.serve --model-name {MODEL_NAME} --port {PORT}

Submitting a SLURM job to run the server on a cluster

conda activate hf-fastapi
slaunch --exp-name="hf-serve" --config="slurm_configs/med_gpu_nexus.json" \
    hf_fastapi/serve.py -m "mistral-7b-inst" -p 8000

You can add a custom SLURM config file to the slurm_configs directory and use it to submit the job. An example of a SLURM config file is given below:

{
    "account": "$SLURM_ACCOUNT",
    "partition": "$SLURM_PARTITION",
    "qos": "default",
    "gres": "gpu:rtxa5000:1",
    "time": "10:00:00",
    "mem": "30G",
    "ntasks-per-node": 1,
    "cpus-per-task": 4
}

Client API

client/example.py contains an example of how to use the API.

from hf_client.client import HFClient
client = HFClient(host=HOST, port=PORT)

# Health check
resp = client.get_heartbeat()
print("Is alive?", resp.is_alive)

# Generate API
prompt = "Question: What is the meaning of life, the universe, and everything? Answer:"
resp = client.generate(prompt=prompt, max_new_tokens=50)
print(f'Input: "{resp.input_text}"')
print("Model:", resp.model_name)
print(f'Output: "{resp.generated_text.strip()}"')

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
hf_client		hf_client
hf_fastapi		hf_fastapi
slurm_configs		slurm_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hf-fastapi

Installation

Usage

Running the server from the host machine

Submitting a SLURM job to run the server on a cluster

Client API

About

Releases

Packages

Languages

License

maharshi95/hf-fastapi

Folders and files

Latest commit

History

Repository files navigation

hf-fastapi

Installation

Usage

Running the server from the host machine

Submitting a SLURM job to run the server on a cluster

Client API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages