Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

k-MeansClustering(Analysis) #178

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7,051 changes: 7,051 additions & 0 deletions machineLearning/k-MeansClustering/Dataset/Live.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,208 changes: 2,208 additions & 0 deletions machineLearning/k-MeansClustering/Model/K_Means_Clustering.ipynb

Large diffs are not rendered by default.

125 changes: 125 additions & 0 deletions machineLearning/k-MeansClustering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
## **PROJECT TITLE**
K-Means Clustering

## **INTRODUCTION**
In this tutorial we will be using Google colab to learn how to use the algorithm. In this tutorial we will be learning how K-means clustering works.

## **PURPOSE**
Our aim here is to process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids.

## **BRIEF EXPLANATION**
K-means Algorithm is an Iterative algorithm that divides a group of n datasets into k subgroups /clusters based on the similarity and their mean distance from the centroid of that particular subgroup/ formed.

## **WORKING CONDITIONS**
<ol>
<li><strong> Mounting the drive </strong></li> </br>
Mount the drive and enter the path to the folder where your dataset is stored, so you can access it. </br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/mounting.jpeg" width=700>
</br>
</br>
</div>
<li><strong> Importing required Packages</strong></li> </br>
Importing all the packages needed for the further processing of the model. </br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/packages.jpeg" width=700>
</div> </br>
Here we imported all the required packages required for the model.
</br>
</br>
</div>
<li><strong> Data Visualization </strong></li> </br>
We then, perform the exploratory data analysis and visualize the data. </br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/dataviz1.jpeg" width=700>
</br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/dataviz2.jpeg" width=700>
</br>
</div>
<li><strong>Preparation of the dataset (Arrangement and Cleaning)</strong></li> </br>
Then we use the different parameters which helps in data cleaning. Eg: kmeans.fit(), means.cluster_centers_ , etc. </br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/dataprep.jpeg" width=700> </br>
</div>
</br>

</div>
<li><strong>Setting up the variables</strong></li> </br>
Once the data visualization is done, Now we will convert our dataset in a format suitable for our model. </br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/setvar.jpeg" width=700>
</br>
</br>
</div>
<li><strong>Then Finally, Testing</strong></li> </br>
</br>
Here, we have achieved a weak classification accuracy of 1% with k=2 by our unsupervised model. So, I changed the value of k and find relatively higher classification accuracy of 88% with k=4. Hence, we can conclude that k=4 being the optimal number of clusters. </br>
<img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/testing.jpeg" width=700>
</div>
</ol>
</br>

## **USAGE**
- The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.

## **USE CASES**
- K-means algorithm is very popular and used in a variety of applications such as market segmentation, document clustering, image segmentation and image compression, etc.

## **LIBRARIES USED**
- Pandas
- Numpy
- matplotlib.pyplot
- seaborn
- kmeans

## **ADVANTAGES**

- Relatively simple to implement.
- Scales to large data sets.
- Guarantees convergence.
- Can warm-start the positions of centroids.
- Easily adapts to new examples.

## **DISADVANTAGES**
- Being dependent on initial values.
- Choosing k manually.
- Clustering data of varying sizes and density.
- Scaling with number of dimensions.


## **APPLICATIONS**
K-Means clustering is used in a variety of examples or business cases in real life, like:

- Academic performance
- Diagnostic systems
- Search engines
- Wireless sensor networks

## **CONCLUSION**

1. In this tutorial, I have implemented the most popular unsupervised clustering technique called K-Means Clustering.

2. I have applied the elbow method and find that k=2 (k is number of clusters) can be considered a good number of cluster to cluster this data.

3. I have find that the model has very high inertia of 237.7572. So, this is not a good model fit to the data.

4. I have achieved a weak classification accuracy of 1% with k=2 by our unsupervised model.

5. So, I have changed the value of k and find relatively higher classification accuracy of 88% with k=4.

5. Hence, we can conclude that k=4 being the optimal number of clusters.</br>
<img src="" width=700>


## **REFERENCES**
- [Introduction to K-means Clustering](https://www.simplilearn.com/tutorials/machine-learning-tutorial/k-means-clustering-algorithm)
- [Pros and Cons of K-means](https://developers.google.com/machine-learning/clustering/algorithm/advantages-disadvantages)
- [Dataset Used](https://archive.ics.uci.edu/ml/datasets/Facebook+Live+Sellers+in+Thailand)

## **Author :**

Hey, This is Hrugved Kolhe.

<a href="https://github.com/hrugved06"><img src="https://avatars.githubusercontent.com/u/59966943?s=400&u=445f4a7598547c0ecdeb22a265dd1a3dad9e297d&v=4" width="100px;" alt=""/><br /><sub><b> Hrugved Kolhe</b></sub></a>
</br>

[![GitHub followers](https://img.shields.io/github/followers/hrugved06.svg?label=Follow%20@hrugved06&style=social)](https://github.com/hrugved06) [![Twitter Follow](https://img.shields.io/twitter/follow/HrugVed_?style=social)](https://twitter.com/HrugVed_)

</br>
<hr style="height:2px;#8080ffborder-width:0;border-radius: 5px;color:gray;background-color:#8080ff">
</br>