“Deploying something useless into production, as soon as you can, is the right way to start a new project. It pulls unknown risk forward, opens up parallel streams of work, and establishes good habits.”
This is a quote by Pete Hodgson, from his article ‘Hello, production’. It in a nutshell, it explains the benefits of taking deployment pains early on in a software development project, and then using the initial deployment skeleton as the basis for rapidly delivering useful functionality into production.
The idea of making an initial ‘Hello, production’ release has had a big influence on how we think about the development of machine learning systems. We’ve mapped ‘Hello, Production’ into the machine learning space, as follows,
Train the simplest model conceivable and deploy it into production, as soon as you can.
A reasonable ‘Hello, production’ model could be one that returns the most frequent class (for classification tasks), or the mean value (for regression tasks). Scikit-Learn provides models for precisely this situation, in the sklearn.dummy
sub-module. If the end-goal is to serve predictions via a web API, then the next step is to develop the server and deploy it into a production environment. Alternatively, if the model is going to be used as part of a batch job, then the next step is to develop the job and deploy that into production.
The advantage of following this process, is that is forces you to confront the following issues early on:
- Getting access to data.
- Getting access to (or creating) production environments.
- Defining the contract (or interface) with the consumers of the model’s output.
- Creating deployment pipelines (manual or automated), to deliver your application into production.
Each one of these issues is likely to involve input from people in other teams and is critical to overall success. Failure on any one of these can signal the end for a machine learning project, regardless of how well the models are performing. Success also demonstrates an ability to deliver functional software, which in our experience creates trust in a project, and often leads to more time being made available to experiment with training more complex model types.
Bodywork is laser-focused on making the deployment of machine learning projects, to Kubernetes, quick and easy. In what follows, we are going to show you how to use Bodywork to deploy a ‘Hello, production’ release for a hypothetical prediction service, using Scikit-Learn and FastAPI. We claim that it will take your under 15 minutes to work through the steps below, which includes setting-up a local Kubernetes cluster for testing.
Deploying machine learning projects using Bodywork requires you to have a GitHub account, Python 3.9 installed on your local machine and access to a Kubernetes cluster. If you already have access to Kubernetes, then skip to Step 1, otherwise read-on to setup a single node Kubernetes cluster on your local machine, using Minikube.
If you don’t have access to a Kubernetes cluster, then an easy way to get started is with Minikube. If you are running on MacOS and with the Homebrew package manager available, then installing Minikube is as simple as running,
$ brew install minikube
If you’re running on Windows or Linux, then see the appropriate installation instructions.
Once you have Minikube installed, start a cluster using the latest version of Kubernetes that Bodywork supports,
$ minikube start --kubernetes-version=v1.22.6 --addons=ingress --cpus=2 --memory=2g
When you’re done with this tutorial, the cluster can be powered-down using.
$ minikube stop
Head over to GitHub and create a new public repository for this project - we called ours bodywork-scikit-fastapi-project. If you want to use Bodywork with private repos, you’ll have to configure Bodywork to authenticate with GitHub via SSH. The Bodywork User Guide contains details on how to do this, but we recommend that you come back to this at a later date and continue with a public repository for now.
Next, clone your new repository locally,
$ git clone https://github.com/bodywork-ml/bodywork-scikit-fastapi-project.git
Create a dedicated Python 3.9 virtual environment in the root directory, and the activate it,
$ cd bodywork-scikit-fastapi-project
$ python3.9 -m venv .venv
$ source .venv/bin/activate
Finally, install the packages required for this project, as shown below,
$ pip install \
bodywork==2.0.2 \
scikit-learn==0.24.1 \
numpy==1.20.2 \
joblib==1.0.1 \
fastapi==0.63.0 \
uvicorn==0.13.4
Then open-up an IDE to continue developing the service.
We want to demonstrate a ‘Hello, production’ release, so we’ll train a Scikit-Learn DummyRegressor
, configured to return the mean value of the labels in a training dataset, regardless of the feature data passed to it. This will still require you to acquire some data, one way or another.
For the purposes of this article, we have opted to create a synthetic one-dimensional regression dataset, where the only feature, X
, has a 42% correlation with the labels, y
, and both features and labels are distributed normally. We have added this step to our training script, train_model.py
, reproduced below. When you run the training script, it will train a DummyRegressor
and save it in the project’s root directory as dummy_model.joblib
.
Beyond use in ‘Hello, production’ releases, models such as this represent the most basic benchmark that any more sophisticated model type must out-perform - which is why the script also persists the model metrics in dummy_model_metrics.txt
, for comparisons with future iterations.
import joblib
import numpy as np
from sklearn.dummy import DummyRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
# create dummy regression data
n_observations = 1000
np.random.seed(42)
X = np.random.randn(n_observations)
y = 0.42 * X + np.sqrt(1 - 0.42 * 0.42) * np.random.randn(n_observations)
# train dummy model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
dummy_model = DummyRegressor(strategy='mean')
dummy_model.fit(X_train, y_train)
# compute dummy model metrics
mse = mean_squared_error(y_test, dummy_model.predict(X_test))
# persist dummy model and metrics
joblib.dump(dummy_model, 'dummy_model.joblib')
with open('dummy_model_metrics.txt', 'w') as f:
f.write(f'mean_squared_error: {mse}\n')
The ultimate aim for our chosen machine learning system, is to serve predictions via a web API. Consequently, our initial ‘Hello, production’ release will need us to develop a skeleton web service that exposes the dummy model trained in Step 2. This is achieved in a Python module we’ve named serve_model.py
, reproduced below, which you should also add to your project.
This module loads the trained model created in Step 2 and then configures FastAPI to start a server with an HTTP endpoint at /api/v1/
. Instances of data, serialised as JSON, can be sent to this endpoint as HTTP POST requests. The schema for the JSON data payload is defined by the FeatureDataInstanace
class, which for our example only expects a single float
field named X
. For more information on defining JSON schemas using Pydantic and FastAPI, see the FastAPI docs.
import joblib
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI(debug=False)
class FeatureDataInstance(BaseModel):
"""Define JSON data schema for prediction requests."""
X: float
@app.post('/api/v1/', status_code=200)
def predict(data: FeatureDataInstance):
"""Generate predictions for data sent to the /api/v1/ route."""
prediction = model.predict([data.X])
return {'y_pred': prediction[0]}
if __name__ == '__main__':
model = joblib.load('dummy_model.joblib')
uvicorn.run(app, host='0.0.0.0', workers=1)
Test the service locally by running serve_model.py
,
$ python serve_model.py
INFO: Started server process [51987]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
And then in a new terminal, send the endpoint some data using curl
,
$ curl http://localhost:8000/predict/v1/ \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": 42}'
{"y_pred":-0.0032494670211433195}
Which confirms that the service is working as expected.
All configuration for Bodywork deployments must be kept in a YAML file, named bodywork.yaml
and stored in the project’s root directory. The bodywork.yaml
required to deploy our ‘Hello, production’ release is reproduced below - add this file to your project.
version: "1.1"
pipeline:
name: bodywork-scikit-fastapi-project
docker_image: bodyworkml/bodywork-core:latest
DAG: scoring-service
stages:
scoring-service:
executable_module_path: serve_model.py
requirements:
- fastapi==0.63.0
- joblib==1.0.1
- numpy==1.20.2
- scikit-learn==0.24.1
- uvicorn==0.13.4
cpu_request: 0.5
memory_request_mb: 250
service:
max_startup_time_seconds: 120
replicas: 2
port: 8000
ingress: true
logging:
log_level: INFO
Bodywork will interpret this file as follows:
- Start a Bodywork container on Kubernetes, to run a service stage called
scoring-service
. - Install the Python packages required to run
serve_model.py
. - Run
serve_model.py
. - Monitor
scoring-service
and ensure that there is always at least one service replica available, at all times - i.e. it if fails for any reason, then immediately start another one.
Refer to the Bodywork User Guide for a complete discussion of all the options available for deploying machine learning systems using Bodywork.
The project is now ready to deploy, so the files must be committed and pushed to the remote repository we created on GitHub.
$ git add -A
$ git commit -m "Initial commit."
$ git push origin main
When triggered, Bodywork will clone the remote repository directly from GitHub, analyse the configuration in bodywork.yaml
and then execute the deployment plan contained within it.
The easiest way to run your first deployment, is to execute the Bodywork create deployment command,
$ bodywork create deployment https://github.com/bodywork-ml/bodywork-scikit-fastapi-project.git
This will orchestrate deployment on your cluster and stream the logs to your terminal.
Once the deployment has completed, the prediction service will be ready for testing. Bodywork will create ingress routes to your endpoint using the following scheme:
/PIPELINE_NAME/STAGE_NAME/
To open an access route to the cluster for testing, start a new terminal and run,
$ minikube kubectl -- -n ingress-nginx port-forward service/ingress-nginx-controller 8080:80
Such that we can make a request for a prediction using,
$ curl http://localhost:8080/bodywork-scikit-fastapi-project/scoring-service/api/v1/ \
--request POST \
--header "Content-Type: application/json" \
--data '{"X": 42}'
{"y_pred": 0.0781994319124968}
Returning the same value we got when testing the service earlier on. Congratulations, you have just deployed your ‘Hello, production’ release!
If you used Minikube to test Bodywork locally, then the next logical step would be to deploy to a remote Kubernetes cluster. There are many options for creating managed Kubernetes clusters in the cloud - see our recommendations
If a web service isn’t a suitable ‘Hello, production’ release for your project, then check out the Deployment Templates for other project types that may be a better fit - e.g. batch jobs or Jupyter notebook pipelines.
When your ‘Hello, production’ release is operational and available within your organisation, it’s then time to start thinking about monitoring your service and collecting data to enable the training of the next iteration. Godspeed!
If you run into any trouble, then don't hesitate to ask a question on our discussion board and we'll step-in to help you out.