Clustering

K-Means

K-Means Algorithm

Add K centroids to the data at random positions.

Associate each data point to the closest centroid (aka association step)

Move the centroids to the mean distance between all associated points

Repeat step 2 and 3 n times, or until some other stop-condition has been met.

K-Means is not deterministic

The initial position of the centroids will influence the final outcome of the algorithm. See the example below:

To solve this problem, we run the algorithm multiple times and average the results.

K-Means and sklearn

class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, 
                             tol=0.0001, precompute_distances='auto', verbose=0, 
                             random_state=None, copy_x=True, n_jobs=1, algorithm='auto')

n_clusters: number of centroids to initialize. Also defines the number of clusters to be found. This should be set using domain knowledge of the problem.
max_iter: number of iterations (associate points, move centroids, repeat) to be run.
n_init: number of times the algorithm will run before outputing the results.

K-means references

Sci-kit learn documentation
Visualizing K-means

Single Linkage Clustering

Single Linkage Clustering Algorithm

Soft Clustering

Can assign the same point to multiple clusters
Probabilistic approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustering.md

clustering.md

Clustering

K-Means

K-Means Algorithm

K-Means is not deterministic

K-Means and sklearn

K-means references

Single Linkage Clustering

Single Linkage Clustering Algorithm

Soft Clustering

Expectation Maximization

Expectation Maximization Properties

Clustering Properties

Impossibility Theorem

Summary

Files

clustering.md

Latest commit

History

clustering.md

File metadata and controls

Clustering

K-Means

K-Means Algorithm

K-Means is not deterministic

K-Means and sklearn

K-means references

Single Linkage Clustering

Single Linkage Clustering Algorithm

Soft Clustering

Expectation Maximization

Expectation Maximization Properties

Clustering Properties

Impossibility Theorem

Summary