How to reduce costs and improve performance of your Machine Learning (ML) workloads?

AWS Machine Learning Purpose-built Accelerators Tutorial

In this workshop you'll learn how to use AWS Trainium and AWS Inferentia with Amazon SageMaker and Hugging Face Optimum Neuron, to optimize your ML workloads! You'll also learn a new methodology to map/qualify/implement end2end solutions for different business challenges. A top-down approach that starts with the use case/business challenge identification/mapping and ends with a trained model deployed as an API, which can be then integrated to your application.

Supposing you have a business challenge to address, which requires custom ML models. You need to prepare a dataset, train/deploy your models and finally integrate these models to your application (eventually automate this whole process). And, in the end, you expect to have a cost-optimized solution that fits into your budget.

The picture bellow shows the steps of the proposed methodology you need to follow in order to successfuly apply it to your own business problem:

1) Use case identification:	The first step of the process is to identify your use case. We prepared a table with a list of common use cases, framed as questions. The idea is to find the Task we'll use to address the problem.
1.1) Task mapping:	After identifying the use case/business challenge, using the use cases table or your own judgment, now it is time to prepare a model for that given Task
2) Model selection:	There is a second table which lists all the current supported models and the Tasks it can implement. Use that table to select your model
3) Model building:	Now, you can make use of the available notebooks to run: 1/ Data Preparation; 2/ Model fine-tuning and 3/ Model deploying. If you already have a pre-trained model, you can skip steps 1 and 2
4) App integration:	In the previous step you deployed your model and it is now exposed as an API. Just integrate your application to this API and start using your model

1) Use case mapping

The following table brings a list of common use cases (framed as questions) and their associated tasks. Use this table as a reference to idenfity which Task is the best option to address your problem. Frame your use case/business challenge as a question and try to find the most similar option in the table. Then, use the task associated to the mappend use case, in the second column, and follow the next steps.

IMPORTANT: If you don't find a use case (question) that resonates with your own use case, try to identify which Task is more appropriate for your scenario (using the tasks table). Also, please cut a ticket with the description of your use case + a framed question so that we can improve this table.

Use case question	Task
How to create an auto-complete mechanism for my application?	CausalLM
How to create a chat-bot to answer questions from an FAQ to my customers?	QuestionAnswering
How can I summarize a long document into a few paragraphs?	CausalLM
How can I create a spam classifier for my emails?	SequenceClassification
How to check if a given text has a good or a bad comment?	SequenceClassification
How do I translate documents from multiple languages to dutch?	CausalLM
How to complete a sentence, given its initial words only	CausalLM
How to classify pictures of products into different classes?	ImageClassification
How to create an Alexa like mechanism which detects specific keywords?	AudioClassification
How to create subtitles to audiobooks?	Text-To-Speech
Given two sentences, how to make sure the second sentence is related to the first?	NextSentencePrediction

1.1) Available Tasks

Task	Description
SequenceClassification	Text classification - binary or multi class
MultipleChoice	Given a context and multiple options, the model predicts which one is correct
TokenClassification	Token classification assigns a label to individual tokens in a sentence. One of the most common token classification tasks is Named Entity Recognition (NER)
MaskedLM	When the input text has a mask that needs to be replaced by a generated term
QuestionAnswering	It answers questions bases on a context or on the acquired knowledge via training
CausalLM	Causal language modeling predicts the next token in a sequence of tokens, and the model can only attend to tokens on the left. This means the model cannot see future tokens
ConditionalGeneration	Fills a mask based on the conditions of the sentence
NextSentencePrediction	NSP consists of giving the model two sentences, sentence A and sentence B. We then say, ‘hey Model, does sentence B come after sentence A?’ — and Model says either IsNextSentence or NotNextSentence.
MaskedImageModeling	Predict masks of the objects in a given picture
ImageClassification	Classifies (binary or multiclass) an image into different classes of objects

2) HF Optimum Neuron - Supported Models

Click here to see the current supported models for training and inference in Hugging Face Optimum Neuron

3) Model Building

Here you can find notebooks you can run on Amazon SageMaker Studio to prepare a model that addresses a task associated to your own use case. They implement a solution for the following use case: How can I create a spam detection mechanism?. The required task is SequenceClassification. In the end we'll have a Binary Text classification model which receives a given email as input and return 0=NOT SPAM and 1=SPAM.

The first notebook downloads a public dataset named Deysi/spam-detection-dataset. The dataset has already samples labelade as spam or not spam.
The second notebook is configured to train a bert-base-uncased for SequenceClassification. You'll notice there are variables you can configure to define the model and the task, then you define some hyperparameters and kick-off the training job using Amazon SageMaker.
The third notebook shows how to compile a pre-trained model to AWS Inferentia and deploy it to a SageMaker real-time Endpoint which will exposes the model as a simple API (WebService).

ATTENTION: if you already have a trained model, compatible with the models listed in the table linked in section 2, then just use the third notebook (you don't need the first two in this case).

Notebook	Description
01 - Data Preparation	How to load and prepare a dataset for fine-tuning a model
02 - Model Fine-tuning	How to kick-off a fine-tuning job using the dataset prepared in the previous notebook
03 - Model Deployment	How to compile and deploy a pre-trained model to Inferentia

4) App Integration

If you followed the steps in the previous sections, you have a running SageMaker real-time endpoint with your model. Now you can make use of AWS SDK for SageMaker runtime which offers libraries available for the most common programming languages. If your application is Python based, you can also make use of Amazon SageMaker Inference API.

5) Contributing

If you have questions, comments, suggestions, etc. please feel free to cut tickets in this repo.

Also, please refer to the CONTRIBUTING document for further details on contributing to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
blogs/01_LLama3-8B_Inferentia_EKS_vLLM		blogs/01_LLama3-8B_Inferentia_EKS_vLLM
tutorials		tutorials
workshops		workshops
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TUTORIALS.md		TUTORIALS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to reduce costs and improve performance of your Machine Learning (ML) workloads?

AWS Machine Learning Purpose-built Accelerators Tutorial

1) Use case mapping

1.1) Available Tasks

2) HF Optimum Neuron - Supported Models

3) Model Building

4) App Integration

5) Contributing

About

Releases

Packages

Languages

License

vjaramillo/ml-specialized-hardware

Folders and files

Latest commit

History

Repository files navigation

How to reduce costs and improve performance of your Machine Learning (ML) workloads?

AWS Machine Learning Purpose-built Accelerators Tutorial

1) Use case mapping

1.1) Available Tasks

2) HF Optimum Neuron - Supported Models

3) Model Building

4) App Integration

5) Contributing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages