This project uses AWS machine learning and IoT tools to develop a deep learning defect classification model and use it for real-time defect detection on a device.
Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
Chip wafer maps show a visual representation of a chip wafer produced in a semiconductory foundry (fab). The maps are generated by microscopic cameras or electronic line scanners that probe for faults.
The fabs look for common defect patterns in this map as a quality control measure. They use either manual inspection or appliances that scan for defect patterns using pattern recognition software. These defect detection methods have several problems:
-
Human inspection is not real-time. Shipping wafers with defects is costly.
-
The appliances are expensive.
-
The appliances do not have redundancy.
-
Fabs cannot easily improve the accuracy of the defect detection or account for new defect patterns.
This project uses AWS machine learning and IoT tools to develop a deep learning defect classification model and use it for real-time defect detection on a device.
The data set we use is:
[Qingyi](https://www.kaggle.com/qingyi). (February 2018). WM-811K wafer map, Version 1. Retrieved January 2018 from https://www.kaggle.com/qingyi/wm811k-wafer-map/downloads/wm811k-wafer-map.zip/1.
Each device (a Raspberry Pi) runs the GreenGrass Core software. Devices publish two kinds of messages:
-
Raw images. These go to the topic
fabwafer/<fabid>/<cameraid>/img/<imgid>
. -
Classifications. These go to the topic
fabwafer/<fabid>/<cameraid>/prediction/<imgid>
Here are some sample messages you can send to the topic fabwafer/faba/camera1/prediction/img1
to test the notifications. The first two should not cause an alert, but the last should. All should write into the DynamoDB table.
{ "imgid": "img1", "timestamp": 1554134552944, "fab": "faba", "camera": "camera1", "prediction": "none", "probability": 0.9 } { "imgid": "img2", "timestamp": 1554134552945, "fab": "faba", "camera": "camera1", "prediction": "loc", "probability": 0.4 } { "imgid": "img3", "timestamp": 1554134552946, "fab": "faba", "camera": "camera1", "prediction": "loc", "probability": 0.9 } ---
For the raw image topic, here’s a sample message. The bytes
field is base64-encoded.
{ "imgid": "img3", "timestamp": 1554134552946, "fab": "faba", "camera": "camera1", "bytes": "" } ---
The source data is from the Kaggle competition. Place this data into an S3 bucket organized into train
and valid
subdirectories. The notebook DataPrep.ipynb
documents the data preparation steps.
First, create an S3 bucket to hold the CloudFormation templates.
aws s3 mb s3://<template bucket>
Now create the stack:
./scripts/create.sh <template bucket> <template prefix> <stack name> <region>
Note the CodeCommit repo output from the stack and check in the code from the pytorch_code
, test_code
, deploy_code
, and trainer_code
directories.
cd .. git clone <clone URL> cd ChipWaferMLRepo cp -r ../ChipWaferAnalysis/pytorch_code/ . git add . git commit -m "First commit. Trying out the build process." git push -u origin master cd .. git clone <training repo clone URL> cd ChipWaferTrainRepo cp -r ../ChipWaferAnalysis/trainer_code/ . git add . git commit -m "First commit. Trying out the build process." git push -u origin master cd .. git clone <test repo clone URL> cd ChipWaferTestRepo cp -r ../ChipWaferAnalysis/test_code/ . git add . git commit -m "First commit. Trying out the build process." git push -u origin master cd .. git clone <deploy repo clone URL> cd ChipWaferDeployRepo cp -r ../ChipWaferAnalysis/deploy_code/ . git add . git commit -m "First commit. Trying out the build process." git push -u origin master
Now go into the GreenGrass console and deploy the group. You’ll need to deploy the group if the Lambda function changes.
Next go to the API Gateway
console, select the proper API, go to the Resources
section, and select Deploy API
from the Actions
menu. Set the Deployment stage
to test
.
Next, create a Cognito user for the review portal.
./scripts/set-user-password.sh <user email> <password> <user pool id> <client id> <group name>
You can obtain the user pool ID, client ID, and group name from the CFN output. The other parameters are at your discretion.
Finally, build and load the React app. Adjust any necessary values in frontend/src/config.js
.
cd frontend npm install # only needed once npm run build aws s3 sync build/ s3://<app bucket>
You can update the stack by passing the --update
flag.
If you update the GreenGrass elements, reset the deployment on the group. Then update the stack and redeploy the group. If you update the Lambda function you must also update the subscription definition.
./scripts/create.sh <template bucket> <template prefix> <stack name> <region> --update
The automated demo right now runs a GreenGrass core device on an EC2 instance. It calls the SageMaker inference endpoint.
If you’d rather do inference on a real device, you can configure a Raspberry Pi.
-
Build an MxNet model. (Eventually we can compile the PyTorch model using SageMaker Neo, but Neo does not yet support Pytorch 1.0.)
-
Run the notebook
notebooks/Classify-MxNet-121.ipynb
. This notebook builds a model using MxNet 1.2.1 and saves the artifacts. Grab the exported artifacts, zip them up, and save them in S3. -
Alternatively, run the notebook
notebooks/Classify-MxNet-SM.ipynb
. This notebook trains the model in SageMaker, and the model artifact is automatically saved in S3.
-
-
Follow the basic Raspberry Pi setup tutorial (parts 1 and 2).
-
Follow the tutorial on deploying inference on the device using MxNet.
-
Copy the
test
image folder onto the device in the path/opt/images/test
-
Starting with the lambda zip package you got from the tutorial, replace the
greengrassObjectClassification.py
with the version in the folderlambda-rpi-inference
and rebuild the zip file. -
When you deploy the Lambda function, set environment variables for the fab, camera, and inference interval.
-
Add a file system resource that maps
/opt/images
to/volumes/images
. Don’t bother with the camera resources. -
Use the model artifact created from the MxNet notebook. The local path should be
/greengrass-machine-learning/mxnet/wafers
.
-
Also note that the Pi needs a 2.5 power source when you run inference. If you use a lesser power source, it’ll boot and seem to work, but it’ll crash when you invoke any neural network for inference.
-
Run ML training jobs on multiple instances
-
Use native PyTorch container rather than custom version (standardizes on fastai 1.0.39)
-
Improve accuracy of MxNet model. It should probably not use
CenterCrop
as the cropping strategy; need to identify other deltas compared to the PyTorch model. -
Use incremental training rather than full training every time, pulling in manually reviewed data.
-
Work on class imbalance problem. Consider oversampling, a different loss function, or an imbalanced sampler. The imbalanced sampler seems to work well but it’s very slow right now.