In this project we focused more to the Endpoints and SDK in Azure Machine Learning. To do this, we used the MS Azure Machine Learning Studio to run a AutoML-algorithm, which fits to the already used Bank Marketing Dataset from Project 1. Furthermore, we made it ready for production and deployed it using ACI (Azure Container Instance). The trained model is consumend via REST-API (http-request). Also a pipline was build and consumed.
-
Download the CSV file of Bank Marketing Data. (You can find the file in this project folder)
-
Uploaded this file to Azure Machine Learning in to a dataset.
-In the AutoML functionality, we have defined that the problem we want to tackle is a classification. We have also defined that the variable y (which refers to the decision whether a customer is eligible or not) is our target variable (binary variable).
-
Then a compute-cluster was generated ("Standard-DS12_v2").
-
Through Azure Auto ML functionality, we found the best model and deployed it with ACI (the algorithm with the name "voting ensemble" was the best in this test)
-
Logging and Appplication Insights have been enabled to provide information about the requirements and performance related to the model in use.
-
The Rest endpoint was testet for connectivity (over Swagger)
-
Finally step, we used the python SDK to generate a pipleine and published it.
The download path was communicated at the beginning of the project. This was then downloaded and imported into the dataset. The steps are identical to those carried out in project 1.
Here can you see, the result of loading the files in the Dataset
and here can you see the confirmation from Azure
In the next steps we are defining the compute cluster and starting the experiment
Start and configuration the experiment:
AUTO ML-experiment is now completed:
Best model is VotingEnsemble with Accuracy of 0.92018 in this experiment.
Here you can see the calibration curve of this model. The calibration curve represents the confidence of a model in terms of its predictions compared to the proportion of positive samples at the respective confidence levels. Here can you find more information over this.
As well as an overview of the results of other algorithms.
Here you can see that the best model from AutoML has been selected and the authenficartion has been activated. The computer type was also set to ACI as required.
Here is the status that the model now has
As seen above, I chose the best model (VotingEnseble) for deployment and enable "Authentication" as well as the computer tpye Azure Container Instance (ACI). The code executed here in logs.py enables "Application Insights". "Application Insights enabled" was disabled before logs.py was executed.
Here the result in the Python Shell
and in Azure
We also tested the API with Swagger using sample data. Swagger is also a very practical tool to easily test REST APIs. Azure provides a swagger.json file to easily test the provided models API and to address the trained models via a program.
This is the web interface of swagger, with "connect" to the JSON file from the model
and here is a example for communication with the model over JSON
we have also testet the endpoint by running the endpoint.py Python-file.
The result of the test can be seen as the output of the Python-Script below:
The feedback corresponds to what was specified as the target parameters. This experiment has thus been successfully tested and verified.
Pipelines - general view
Pipeline - Endpoints
After upload the Jupyter programm in to Azure, can you find this on this file here:
Running the experiment over SDK in a Juypter Notebook.
Message that it is finished the experiment over the SDK
Aceess to the REST Endpoint over Jupyter:
Or you can see the url to the REST API here together with the published pipeline:
And a final test over python
with the best algorithm
-
I will use by repating this a longer computation period/time frame to get higher accuracy and give the AutoML arlgorithms more time to fine-tune.
-
Also enable Deep Learning functionality to try NN-based algorithms (requires GPU-capable computational resources). This could yield better results, provided that the amount of data can be increased as listed in point 3 in a moment.
-
I would try to get a larger data set, possibly from other regions/countries. Since data from only one specific region could also bias the algorithm if it were to be used elsewhere.