In this project, an image classification model will be built using AWS Sagemaker to distinguish bicycles from motorcycles. AWS Lambda functions are used to build supporting services and AWS Step Functions will be orchestrating the composition of the model and services into an event-driven application. The end of this project is a scalable, machine learning-enabled AWS application.
The project approaches image classification from a logistics point of view by building an image classification model that can automatically detect which kind of vehicle delivery drivers have, in order to route them to the correct loading bay and orders. Assigning delivery professionals who have a bicycle to nearby orders and giving motorcyclists orders that are farther
The goal is to ship a scalable and safe model. The model must scale to meet demand, and safeguards are in place to monitor and control for drift or degraded performance.
- Step 1: Data staging
- Step 2: Model training and deployment
- Step 3: Lambdas and step function workflow
- Step 4: Testing and evaluation
Performing a complete ETL (extract, transform, load) on the CIFAR-100 dataset, training an image classifier using sagemaker.estimator.Estimator
, and constructing a unique endpoint API used for predictions
Three lambda functions were created to automate the predictions of images and to filter low confidence rates (Check project files 4.2 section for more details). These lambda functions were put together in a single workflow using AWS Step Functions
A screenshot of the working step function utilizing the lambda function in a workflow
Monitoring data was extracted from S3 and visualizations were created to check performance, capture errors, and make sure everything was working as expected. One of the visualizations created to check the model's performance
The project is extensively detailed and explained step by step in the main.ipynb Jupyter notebook and within each lambda function. The following is just an overview.
The CIFAR-100 dataset was used for this project. It consists of 60000 32x32 color images in 10 classes, with 6000 images per class. The CIFAR dataset is open source and generously hosted by the University of Toronto at: https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
1. main.ipynb
: Jupyter notebook with detailed steps and explanations of implementing the workflow for Image Classification. This includes the necessary preprocessing of the CIFAR-100 dataset, model training, deployment, and monitoring using Amazon SageMaker and other associated AWS Services
2. main.html
: Web page displaying the main.ipynb Jupyter notebook
3. lambda 1 - Image Serialization.py
: A Lambda function (serializeImageData) designed to serialize target data from an S3 bucket, converting image data to base64 format for subsequent processing within an AWS Step Function
4. lambda 2 - Image Classification.py
: A Lambda function utilizing SageMaker for image classification, decoding base64 image data, making predictions using a deployed model, and returning results to a Step Function.
5. lambda 3 - Filtering low confidence.py
: A Lambda function (Filtering_low_confidence) that checks if any inference values in the given event exceed a specified threshold (0.93), and raises an error if the threshold confidence is not met.
6. Working Step Functions Graph and example.png
: screen capture of the working step function.
7. MyStateMachine-hs53ltoxf.asl.json
: Step Function exported to JSON
The project was built using Amazon SageMaker. Dependencies within SageMaker are as follow:
Python 3 (Data Science) - v3.7.10 kernel
ml.t3.medium instance
Python 3.8 runtime for the AWS Lambda Functions
This was the Fourth project of the "Udacity Machine Learning Fundamentals Nandegree" offered by AWS as part of the "AWS AI & ML scholarship"
Confirmation link: link