ZeusCodes-Official · hrugved06 · Aug 9, 2021 · Aug 9, 2021 · Aug 9, 2021 · Aug 9, 2021
diff --git a/machineLearning/k-MeansClustering/Dataset/Live.csv b/machineLearning/k-MeansClustering/Dataset/Live.csv
diff --git a/machineLearning/k-MeansClustering/Images/dataprep.jpeg b/machineLearning/k-MeansClustering/Images/dataprep.jpeg
diff --git a/machineLearning/k-MeansClustering/Images/dataviz1.jpeg b/machineLearning/k-MeansClustering/Images/dataviz1.jpeg
diff --git a/machineLearning/k-MeansClustering/Images/dataviz2.jpeg b/machineLearning/k-MeansClustering/Images/dataviz2.jpeg
diff --git a/machineLearning/k-MeansClustering/Images/mounting.jpeg b/machineLearning/k-MeansClustering/Images/mounting.jpeg
diff --git a/machineLearning/k-MeansClustering/Images/packages.jpeg b/machineLearning/k-MeansClustering/Images/packages.jpeg
diff --git a/machineLearning/k-MeansClustering/Images/setvar.jpeg b/machineLearning/k-MeansClustering/Images/setvar.jpeg
diff --git a/machineLearning/k-MeansClustering/Images/testing.jpeg b/machineLearning/k-MeansClustering/Images/testing.jpeg
diff --git a/machineLearning/k-MeansClustering/Model/K_Means_Clustering.ipynb b/machineLearning/k-MeansClustering/Model/K_Means_Clustering.ipynb
diff --git a/machineLearning/k-MeansClustering/README.md b/machineLearning/k-MeansClustering/README.md
@@ -0,0 +1,125 @@
+## **PROJECT TITLE**
+K-Means Clustering
+
+## **INTRODUCTION**
+In this tutorial we will be using Google colab to learn how to use the algorithm. In this tutorial we will be learning how K-means clustering works.
+
+## **PURPOSE**
+Our aim here is to process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids.
+
+## **BRIEF EXPLANATION**
+K-means Algorithm is an Iterative algorithm that divides a group of n datasets into k subgroups /clusters based on the similarity and their mean distance from the centroid of that particular subgroup/ formed.
+
+## **WORKING CONDITIONS**
+<ol>
+    <li><strong> Mounting the drive </strong></li> </br>
+    Mount the drive and enter the path to the folder where your dataset is stored, so you can access it. </br>
+  <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/mounting.jpeg" width=700> 
+</br>
+</br>
+</div>
+    <li><strong> Importing required Packages</strong></li> </br>
+    Importing all the packages needed for the further processing of the model. </br>
+    <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/packages.jpeg" width=700>
+</div> </br>
+Here we imported all the required packages required for the model.
+</br>
+</br>
+</div>
+    <li><strong> Data Visualization </strong></li> </br>
+    We then, perform the exploratory data analysis and visualize the data. </br>
+  <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/dataviz1.jpeg" width=700>
+  </br>
+  <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/dataviz2.jpeg" width=700>
+  </br>
+</div>
+    <li><strong>Preparation of the dataset (Arrangement and Cleaning)</strong></li> </br>
+    Then we use the different parameters which helps in data cleaning. Eg: kmeans.fit(), means.cluster_centers_ , etc.  </br>
+  <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/dataprep.jpeg" width=700> </br>
+</div>
+  </br>
+
+</div>
+    <li><strong>Setting up the variables</strong></li> </br>
+    Once the data visualization is done, Now we will convert our dataset in a format suitable for our model.  </br>
+  <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/setvar.jpeg" width=700>
+  </br>
+  </br>
+</div>
+    <li><strong>Then Finally, Testing</strong></li> </br>
+    </br>
+    Here, we have achieved a weak classification accuracy of 1% with k=2 by our unsupervised model. So, I changed the value of k and find relatively higher classification accuracy of 88% with k=4. Hence, we can conclude that k=4 being the optimal number of clusters. </br>
+  <img src="https://github.com/DevIncept-Contribution-Program-21/DS-ScriptsNook/blob/main/Machine%20Learning/Algorithms/K-Means%20Clustering/Images/testing.jpeg" width=700>
+</div>
+</ol>
+</br>
+
+## **USAGE**
+- The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.
+
+## **USE CASES**
+- K-means algorithm is very popular and used in a variety of applications such as market segmentation, document clustering, image segmentation and image compression, etc.
+
+## **LIBRARIES USED**
+- Pandas
+- Numpy
+- matplotlib.pyplot
+- seaborn
+- kmeans
+
+## **ADVANTAGES**
+
+- Relatively simple to implement.
+- Scales to large data sets.
+- Guarantees convergence.
+- Can warm-start the positions of centroids.
+- Easily adapts to new examples.
+
+## **DISADVANTAGES**
+- Being dependent on initial values.
+- Choosing k manually.
+- Clustering data of varying sizes and density.
+- Scaling with number of dimensions.
+
+
+## **APPLICATIONS**
+K-Means clustering is used in a variety of examples or business cases in real life, like:
+
+- Academic performance 
+- Diagnostic systems 
+- Search engines 
+- Wireless sensor networks
+
+## **CONCLUSION**
+
+1. In this tutorial, I have implemented the most popular unsupervised clustering technique called K-Means Clustering.
+
+2. I have applied the elbow method and find that k=2 (k is number of clusters) can be considered a good number of cluster to cluster this data.
+
+3. I have find that the model has very high inertia of 237.7572. So, this is not a good model fit to the data.
+
+4. I have achieved a weak classification accuracy of 1% with k=2 by our unsupervised model.
+
+5. So, I have changed the value of k and find relatively higher classification accuracy of 88% with k=4.
+
+5. Hence, we can conclude that k=4 being the optimal number of clusters.</br>
+<img src="" width=700>
+
+
+## **REFERENCES**
+- [Introduction to K-means Clustering](https://www.simplilearn.com/tutorials/machine-learning-tutorial/k-means-clustering-algorithm)
+- [Pros and Cons of K-means](https://developers.google.com/machine-learning/clustering/algorithm/advantages-disadvantages)
+- [Dataset Used](https://archive.ics.uci.edu/ml/datasets/Facebook+Live+Sellers+in+Thailand)
+
+## **Author :**
+
+Hey, This is Hrugved Kolhe.
+
+<a href="https://github.com/hrugved06"><img src="https://avatars.githubusercontent.com/u/59966943?s=400&u=445f4a7598547c0ecdeb22a265dd1a3dad9e297d&v=4" width="100px;" alt=""/><br /><sub><b> Hrugved Kolhe</b></sub></a>
+</br>
+
+[![GitHub followers](https://img.shields.io/github/followers/hrugved06.svg?label=Follow%20@hrugved06&style=social)](https://github.com/hrugved06)  [![Twitter Follow](https://img.shields.io/twitter/follow/HrugVed_?style=social)](https://twitter.com/HrugVed_)
+
+</br>
+<hr style="height:2px;#8080ffborder-width:0;border-radius: 5px;color:gray;background-color:#8080ff">
+</br>