Skip to content

Tried Kmeans algo, got poor performance on obvious clusters (screenshot attached) #983

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
Nomia opened this issue Jul 27, 2021 · 1 comment

Comments

@Nomia
Copy link

Nomia commented Jul 27, 2021

Brief Intro

I generated different clusters with python numpy
image

then I trained in my Xcode program(code attached in the More Details section)
vectors are the points I generated from python numpy for 3 clusters
labels are [0,1,2]

finally, I got the result:

60 total vectors, print(label, index)
1 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
1 11
1 12
1 13
1 14
1 15
1 16
1 17
1 18
1 19
1 20
1 21
1 22
1 23
1 24
1 25
1 26
1 27
1 28
1 29
1 30
1 31
1 32
1 33
1 34
1 35
1 36
1 37
1 38
1 39
2 40
0 41
2 42
2 43
0 44
2 45
2 46
2 47
2 48
2 49
2 50
2 51
2 52
0 53
2 54
2 55
2 56
2 57
0 58
2 59

as you can see from the result, label 1 cluster hold most of the vectors, and the rest seems like random guesses

More Details

func testKmeans() {
        let clusterOne = [[18.188526006613863, 11.248261784580054], [10.977922184631586, 15.109012046596924], [5.452898113107155, 3.5321009807982096], [10.18536635911578, 13.862703624731203], [10.328957328435466, 12.537782900374987], [1.4189763353100684, 5.266359205174575], [9.067851641179294, 10.920729828409662], [10.368836624859934, 15.665231690024799], [2.145960771070766, 4.401557392504384], [2.7620911225735183, 14.920065055992431], [14.12720041718686, 3.468361932877836], [4.167420371109743, 15.894531551132244], [16.41574049710068, 17.549326964691048], [19.01838762281819, 6.618632167330248], [6.850688295269066, 0.8848921926920426], [10.391360018193515, 7.0647893204675], [5.564145939535507, 17.082249462545413], [18.697486709978435, 10.845389268062606], [9.944644191259359, 7.930633818473652], [13.554062994802381, 11.393168588934731]]
        let clusterTwo = [[98.22717216195596, 94.43311365626302], [95.96801168452608, 96.39703533410699], [107.25001265191436, 94.51954931160518], [96.71197635114106, 99.36012790381027], [97.05760533627782, 100.43590091710398], [107.29405539101856, 99.48442540590167], [100.24229465041242, 100.69277829864974], [104.02306613277689, 96.79355788788355], [102.07514033663578, 94.1786915163715], [104.16619176003175, 104.84321793930332], [107.95395690451934, 96.70324724184555], [106.07036600893042, 99.49082144608062], [93.45428443493724, 97.14765864686596], [103.84075072097382, 99.77036826997173], [103.80084391099508, 98.40957369095679], [93.79214518785558, 100.64095494475106], [98.04543573640187, 103.14245232979145], [101.40503319569623, 101.54303891277588], [100.97940805244447, 101.53228869326816], [91.46287923292982, 98.79682339657157]]
        let clusterThree = [[206.66786538454946, 199.1017021618947], [199.38694598772693, 196.8381957876811], [208.5302089809453, 202.86351250650603], [204.10039196509916, 206.8368777115382], [205.43443870343214, 196.6941598041279], [208.12689482387472, 203.11836105818477], [203.23593716528936, 199.21465204846663], [204.7865753112437, 203.7801225895648], [195.30354179620295, 207.66199316618227], [199.73939127272905, 209.14920751840256], [206.7571092925273, 198.82212296945562], [200.80520574403877, 203.20624053902793], [192.02336967359818, 200.33378494221515], [201.19365787431974, 191.9066861191232], [196.2502592069524, 208.9488333465134], [208.89698463042888, 200.69718685831506], [202.270617434823, 204.9654317320587], [195.50674902955151, 208.22877709245074], [197.95303741057813, 191.43455765780755], [202.00407100481, 204.1023751597576]]
        
        var vectors = [Vector]()
        for encoding in clusterOne {
            let vector = Vector(encoding)

            vectors.append(vector)
        }
        
        for encoding in clusterTwo {
            let vector = Vector(encoding)

            vectors.append(vector)
        }
        
        for encoding in clusterThree {
            let vector = Vector(encoding)

            vectors.append(vector)
        }

        // cluster all the face encodings
        var labels = [Int]();
        for label in 0...2 {
            labels.append(label)
        }
        let kmm = KMeans<Int>(labels: labels)
        let result = kmm.trainCenters(vectors, convergeDistance: 0.0001)

        print(vectors.count)
        for (i, label) in kmm.fit(vectors).enumerated() {
            print(label, i)
        }
}
@Nomia Nomia changed the title Tried Kmeans algo, get poor performance on obvious clusters (screenshot attached) Tried Kmeans algo, got poor performance on obvious clusters (screenshot attached) Jul 27, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@Nomia and others