Udacity Machine Learning Project 2 for Machine Learning Engineer with Microsoft Azure.

In this project we focused more to the Endpoints and SDK in Azure Machine Learning. To do this, we used the MS Azure Machine Learning Studio to run a AutoML-algorithm, which fits to the already used Bank Marketing Dataset from Project 1. Furthermore, we made it ready for production and deployed it using ACI (Azure Container Instance). The trained model is consumend via REST-API (http-request). Also a pipline was build and consumed.

Summary of the procedure

Download the CSV file of Bank Marketing Data. (You can find the file in this project folder)
Uploaded this file to Azure Machine Learning in to a dataset.

-In the AutoML functionality, we have defined that the problem we want to tackle is a classification. We have also defined that the variable y (which refers to the decision whether a customer is eligible or not) is our target variable (binary variable).
Then a compute-cluster was generated ("Standard-DS12_v2").
Through Azure Auto ML functionality, we found the best model and deployed it with ACI (the algorithm with the name "voting ensemble" was the best in this test)
Logging and Appplication Insights have been enabled to provide information about the requirements and performance related to the model in use.
The Rest endpoint was testet for connectivity (over Swagger)
Finally step, we used the python SDK to generate a pipleine and published it.

Key Steps

1. Loading the CSV to the Dataset

The download path was communicated at the beginning of the project. This was then downloaded and imported into the dataset. The steps are identical to those carried out in project 1.

Here can you see, the result of loading the files in the Dataset

and here can you see the confirmation from Azure

2. AutoML - setup:

In the next steps we are defining the compute cluster and starting the experiment

Start and configuration the experiment:

AUTO ML-experiment is now completed:

Best model is VotingEnsemble with Accuracy of 0.92018 in this experiment.

Here you can see the calibration curve of this model. The calibration curve represents the confidence of a model in terms of its predictions compared to the proportion of positive samples at the respective confidence levels. Here can you find more information over this.

As well as an overview of the results of other algorithms.

2. Deployment of best model

Here you can see that the best model from AutoML has been selected and the authenficartion has been activated. The computer type was also set to ACI as required.

Here is the status that the model now has

3. Enable logging

As seen above, I chose the best model (VotingEnseble) for deployment and enable "Authentication" as well as the computer tpye Azure Container Instance (ACI). The code executed here in logs.py enables "Application Insights". "Application Insights enabled" was disabled before logs.py was executed.

Here the result in the Python Shell

and in Azure

4. Swagger & endpoint consumption

We also tested the API with Swagger using sample data. Swagger is also a very practical tool to easily test REST APIs. Azure provides a swagger.json file to easily test the provided models API and to address the trained models via a program.

This is the web interface of swagger, with "connect" to the JSON file from the model

and here is a example for communication with the model over JSON

we have also testet the endpoint by running the endpoint.py Python-file.

The result of the test can be seen as the output of the Python-Script below:

The feedback corresponds to what was specified as the target parameters. This experiment has thus been successfully tested and verified.

5. Overview over the Pipeline

Pipelines - general view

Pipeline - Endpoints

Pipeline - REST-endpoint

6. Runnig the SDK over Jupyter

After upload the Jupyter programm in to Azure, can you find this on this file here:

Running the experiment over SDK in a Juypter Notebook.

Message that it is finished the experiment over the SDK

Aceess to the REST Endpoint over Jupyter:

Or you can see the url to the REST API here together with the published pipeline:

And a final test over python

with the best algorithm

Recording

YouTube

Suggestions for improvement for future experiments

I will use by repating this a longer computation period/time frame to get higher accuracy and give the AutoML arlgorithms more time to fine-tune.
Also enable Deep Learning functionality to try NN-based algorithms (requires GPU-capable computational resources). This could yield better results, provided that the amount of data can be increased as listed in point 3 in a moment.
I would try to get a larger data set, possibly from other regions/countries. Since data from only one specific region could also bias the algorithm if it were to be used elsewhere.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
swagger		swagger
AutoMLa6ff8ec9d80 (1).zip		AutoMLa6ff8ec9d80 (1).zip
README.md		README.md
aml-pipelines-with-automated-machine-learning-step.ipynb		aml-pipelines-with-automated-machine-learning-step.ipynb
bankmarketing_train.csv		bankmarketing_train.csv
benchmark.sh		benchmark.sh
config.json		config.json
data.json		data.json
endpoint.py		endpoint.py
logs.py		logs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity Machine Learning Project 2 for Machine Learning Engineer with Microsoft Azure.

Summary of the procedure

Key Steps

1. Loading the CSV to the Dataset

2. AutoML - setup:

2. Deployment of best model

3. Enable logging

4. Swagger & endpoint consumption

5. Overview over the Pipeline

6. Runnig the SDK over Jupyter

Recording

Suggestions for improvement for future experiments

About

Releases

Packages

Languages

Petopp/Udacity_Project_2

Folders and files

Latest commit

History

Repository files navigation

Udacity Machine Learning Project 2 for Machine Learning Engineer with Microsoft Azure.

Summary of the procedure

Key Steps

1. Loading the CSV to the Dataset

2. AutoML - setup:

2. Deployment of best model

3. Enable logging

4. Swagger & endpoint consumption

5. Overview over the Pipeline

6. Runnig the SDK over Jupyter

Recording

Suggestions for improvement for future experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages