Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FEA] Changing COO Index_Type in UMAP to prevent overflow when running with large datasets #6010

Open
jinsolp opened this issue Aug 6, 2024 · 0 comments
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@jinsolp
Copy link
Contributor

jinsolp commented Aug 6, 2024

Description

UMAP cannot run large datasets right now because of an overflow issue.
raft::sparse::COO defaults to using int for its Index_Type and this becomes a problem.

When this issue is solved, we need to update UMAPAlgo::FuzzySimplSet::ML::run() to take COO with an Index_Type other than int.

Details

Specifically, coo_symmetrize (raft function called from UMAPAlgo::FuzzySimplSet::ML::run()) allocates nnz * 2 space on device. For a large dataset (e.g. 88M samples with knn graph degree 16) this value is larger than max int (88M * 16 * 2 > INT_MAX).

@jinsolp jinsolp added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 6, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant