The following instructions will guide you on how to get started with MosaicML’s inference service.
In this folder, we provide examples of model handler implementations for various models. They can be used out-of-the box with the yamls provided using the MosaicML inference service.
Before using the inference service, you must request access here.
Once you have access,
- Follow instructions here to install mosaicml-cli, our command line interface that will allow you to deploy models and run inference on them.
- Once you have
mcli
set up, the Inference Docs will give you a high level overview of how you can deploy your model and interact with it. - Now, you are ready to look at the README for each of the models in this repo to start running inference on them!
We have provided examples for 3 different model types in this repo:
Each of these model directories have model handlers and yaml files.
Model handlers define how your model is loaded and how the model should be run when receiving a request. The model handlers are expected to be a class that implements a predict
function and optionally a predict_stream
function if you'd like your deployment to support streaming outputs. For more details about the structure of the model handlers, please refer to the mcli docs.
If a model that you'd like to deploy isn't supported by one of the existing model handlers, you can flexibly configure the behavior of your models in the server by implementing your own model handler.
Deployment submissions to the MosaicML platform can be configured through a YAML file or using our Python API’s InferenceDeploymentConfig class. We have provided YAMLs in these examples which contain information like name, image, download path for the model, and more. Please see this link to understand what these parameters mean.
Note: The image
field in the YAML corresponds to the Docker image for the Docker container that is executing your deployment. It is set to our latest inference release.
- Check out our LLM foundry, which contains code to train, finetune, evaluate and deploy LLMs.
- Check out the Prompt Engineering Guide to better understand LLMs and how to use them.
- Check out the MosaicML Blog to learn more about large scale AI
- Follow us on Twitter and LinkedIn
- Join our community on Slack