-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Minimum samples to cluster & cluster count linked #99
Comments
This is a bit confusing. If you have any ideas for how to fix it, a PR would be very welcome. I think this started as a necessary condition when using a clustering method that requires Do you want to expose the option to set |
Dear all, I ran into the same odd interaction and decided to make explicit the number of cells. I also had some odd interaction with agglomerative clustering and I think that it would be better to expose the option Kind regards, |
Cool, we can do that. Originally this was added, because some cluster algorithms in Scikit-learn exit with an error if you try clustering data that has a size less than |
Hi @rintrah, Pull requests are very welcome if you have already made some of these changes! Thank you! |
Hello, Sauln, I am sorry for my late response, but the past days have been hectic. I will write a clean version and do a pull request. I haven't done pull request before, but I think it won't be so difficult. Best wishes, |
No worries! We look forward to the PR and will try to help in whatever way possible to make the experience easy. Please let me know if you have any questions. |
There's a slightly odd interaction with minimum cluster size and cells with few entries. In
kmapper.py:372
, cells are only checked for clustering if there are>= min_cluster_samples
samples within them. Butmin_cluster_samples
is set ton_clusters
.So if you set
n_clusters
to 3, then any cell with 3 samples in will produce 3 separate 1-sample clusters in the output. Any cell with 2 samples will produce 0 clusters (and thus likely a different unique sample count in the output). This probably has little to no impact on the graph and is unlikely to show up except in small trial datasets, but it is a bit confusing that the parameter is reused for a different (if related) purpose.The text was updated successfully, but these errors were encountered: