Skip to content

Implement hierarchical clustering #11

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
Mec-iS opened this issue Oct 13, 2020 · 6 comments
Open

Implement hierarchical clustering #11

Mec-iS opened this issue Oct 13, 2020 · 6 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Mec-iS
Copy link
Collaborator

Mec-iS commented Oct 13, 2020

Motivation: why do we need hierarchical when we have already kmeans?

Vocabulary:

  • divisive clustering: ...
  • agglomerative clustering: average, weighted, median, centroid, Ward

Sub-tasks:

  • pick one or a minimal set of metrics-distances
  • pick one or a minimal set of linkage strategies
  • pick one or more algorithms (SLINK for single-linkage and CLINK for complete-linkage clustering)

Visualisations: (?)

Other implementations:

@Mec-iS Mec-iS self-assigned this Oct 13, 2020
@VolodymyrOrlov
Copy link
Collaborator

VolodymyrOrlov commented Oct 13, 2020

@Mec-iS Take a look at this paper that describes a FastPair algorithm. This algorithm helps to speedup cluster merge operation.
Also I suggest to take a look at fastcluster implementation of the HC described in this paper. Figures at the bottom of this page show very well the difference between fastcluster and other implementations. Unfortunately it is written in C++.

@Mec-iS
Copy link
Collaborator Author

Mec-iS commented Oct 14, 2020

Notes

FastPair

fastcluster:

Alternatives:

  1. translate from the Python interface in Rust, then hunt for changes/improvements in the C++ version
  2. FFI to call C++ from Rust, in particular using rustcxx

Background questions:

  • are we going for a 100% Rust native implementation?
  • are we supposed to allow or not usages of unsafe blocks?

@VolodymyrOrlov
Copy link
Collaborator

Background questions:

  • are we going for a 100% Rust native implementation?
  • are we supposed to allow or not usages of unsafe blocks?

Yes, calling C++ library from SmartCore is not an option for multiple reasons. We'll have to ship fastcluster with SmartCore somehow and it diminishes usefulness of our library.

Do you know C++ by any chance? 😄 If not, feel free to go with any implementation, even if it is not the fastest out there. Another option would be to try to implement the algorithm (it is described here) yourself. It would be super awesome if you can implement fastcluster in Rust, because in this case we will be the only library in Rust that has it.

@Mec-iS
Copy link
Collaborator Author

Mec-iS commented Oct 15, 2020

what role you had in mind for FastPair?

@VolodymyrOrlov
Copy link
Collaborator

what role you had in mind for FastPair?

As an alternative to fastcluster

@VolodymyrOrlov VolodymyrOrlov added the enhancement New feature or request label Jan 5, 2021
@Mec-iS
Copy link
Collaborator Author

Mec-iS commented Aug 23, 2022

FastPair is implemented #142

We can move on to implement clustering; starting with AgglomerativeClustering

Tasks:

  1. do parameters parsing as in _fit()(we need only some of the required parameters)
  2. implement ward_tree
  3. return _labels

Basic linkage is Ward (that needs euclidean distance).

@morenol morenol added the help wanted Extra attention is needed label Sep 27, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants