Udacity Data Scientist Nanodegree Project 3
The libraries needed to run the code are the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
This is the third project of Udacity Data Scientist Nanodegree. In this project, I applied unsupervised learning techniques to identify segments of the population that form the core customer base for a mail-order sales company in Germany. Thus, marketing campaigns can be targeted towards these identified segments so as to have the highest expected rate of returns.
There are a iPython notebook and a html file to showcase work related to this project. The html file was generated from iPython notebook.The actual data could not be included due to the terms and conditions of AZ Direct GmbH that prohibits using the data in any other context other than during this Udacity course.
- A good relatioship exists between the general and customers population data.
- The customers clusters are not universal i.e., the customers are not uniformly distributed in the general population with the population clusters so customers resides in certain clusters.
- The mail oreder company should target the clusters 2 and 16 where the customers data are over represented and neglect clusters 1,4, 10 where the customers clusters are under represented.
- The over-represented customer clusters shows that it belongs to category of older(45-60) and average income earners male.
- On the other hand, the customers clusters that are underrepresented belong to the category of younger and average earners females. For further details, see jupyter notebook
See license here Credits must be given to Udacity for providing starting code for this project. The data was provided by Udacity partners at Bertelsmann Arvato Analytics.