Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Suggestion: Use subsampling and early stopping to speed up medoid calculation. #75

Open
bendavidsteel opened this issue Jan 24, 2025 · 1 comment

Comments

@bendavidsteel
Copy link
Contributor

Hi! Love the library, great work!

Using medoids for large tasks takes ages, I left a run going overnight and it didn't finish. So to speed it up, I adding subsampling and early stopping to the medoid function, and anecdotally it massively sped up the calculation and produced results that look good.

I haven't done any more thorough testing which is why I haven't submitted a PR, but does this sounds like something that would be merged into the project?

@lmcinnes
Copy link
Contributor

That sounds like a great idea. I hadn't really stress tested it for really large workloads, so it is definitely possible it doesn't scale as well as it should. What you are proposing seems both simple and effective, which sounds great to me. A PR would be most welcome!

lmcinnes added a commit that referenced this issue Jan 24, 2025
Adding medoid limits for reasonable execution time, for #75
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants