Bisecting K Means

Description:

The script reads a dataset from a text file and performs bisecting K-Means clustering. The output displays the error for the number of clusters (K) ranging from 2 to 10. The error is defined as the sum of the Euclidean distances between data points and their respective cluster centroids.

Two random data points are chosen as initial cluster centroids. Cluster centroids are iteratively updated and data points are assigned to clusters based the nearest centroid. This process is repeated 20 times, and the clustering with the minimum error is selected for each K number of clusters.

Features:

Dynamic K Selection: The script explores a range of cluster numbers (k) from 2 to 10, calculating the error for each k
Random Initialization: 20 sets of inital centroids are chosen to enhance the chances of finding a global minimum for the error
Iterative Refinement: Clusters are refined by itteratively updating centroids after data points are assigned to clusters based on the nearest centroids
Error Visualization: Errors for each k are plotted to visualize the trade-off between the number of clusters and the resulting error

Usage:

Clone this repo locally
Install and update relevant libraries
Save your dataset in a text file with space-separated numeric values, with the last column as the class label. Example data is provided.
Run the script from the command line, providing the dataset file as an argument: python3 kmeans.py dataset.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
example_datasets		example_datasets
README.md		README.md
kmeans.py		kmeans.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bisecting K Means

Description:

Features:

Usage:

About

Languages

alessandra-rodriguez/bisecting-kmeans

Folders and files

Latest commit

History

Repository files navigation

Bisecting K Means

Description:

Features:

Usage:

About

Resources

Stars

Watchers

Forks

Languages