Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Separate HappyTreeFunctions for internal and leaf nodes #864

Merged
merged 5 commits into from
May 18, 2023

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented May 9, 2023

The goal of this PR is to separate the functions for leaf and internal nodes based on tags.

@aprokop aprokop added performance Something is slower than it should be refactoring Code reorganization labels May 9, 2023
@aprokop
Copy link
Contributor Author

aprokop commented May 9, 2023

GitLab results are pretty good, with minimal slowdowns. I'll double check on AMD CPU and MI250X, and will check DBSCAN and MST results on both V100 and MI250X.

Ran DBSCAN on small and large HACC problems on MI250X, no issues. However, there's a bit of slowdown for MST (~5% total, ~8% boruvka).

@aprokop aprokop requested a review from dalg24 May 9, 2023 00:37
@aprokop aprokop marked this pull request as draft May 9, 2023 03:02
@aprokop
Copy link
Contributor Author

aprokop commented May 9, 2023

HIP benchmark results are unsatisfactory, and are very different from CUDA.

@aprokop aprokop marked this pull request as ready for review May 9, 2023 21:24
@dalg24
Copy link
Contributor

dalg24 commented May 9, 2023

Comparison against current master or pre-merge of #860 ?

@aprokop
Copy link
Contributor Author

aprokop commented May 10, 2023

Comparison against current master or pre-merge of #860 ?

For benchmarks:

Against the current master: GitLab CI shows 17-21% slowdown in Serial knn for Power9 with everything else parity. CUDA no changes, Serial on AMD CPU no changes, HIP no changes. I don't understand why Power9 is slowed down, and am willing to accept it.

Against the pre-#860 master, about 5-6% construction/5-8% radius/0% knn in Serial for AMD CPU, 6-14% construction/0% radius/0% knn on V100, 5-6% construction/7-8% radius/-5-10% knn on MI250X.

@masterleinad masterleinad self-requested a review May 10, 2023 21:05
Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be using the wrong encoding for the left child and rope now that we are storing them in two independent arrays.

@aprokop
Copy link
Contributor Author

aprokop commented May 11, 2023

I think we might be using the wrong encoding for the left child and rope now that we are storing them in two independent arrays.

I tried a different scheme where internal nodes are always negative (offset by -1), leaf nodes are nonnegative. The code is indeed simpler, however, there's no performance difference compared to the current master (see gitlab).

Moreover, and the real stopping block, is that I am afraid to touch minimum spanning tree. It is a complete mess, where the current scheme is hardcoded so tightly in multiple arrays, keeping track of everything through permuted indices, and the whole thing is so fragile, that I believe it would take days to make any progress on it. I really don't want to spend time doing that for little benefit.

If we do want to tackle it, I think we first need to try to switch the order of internal and leaf nodes in the current hierarchy, so that leaf nodes are $[0, n)$ and internal nodes are $[n, 2n-1)$. I think that would cut down on the number of index transformations in MST, getting rid of a ton of component - n + 1 kind of things.

@aprokop aprokop marked this pull request as draft May 11, 2023 18:42
@aprokop
Copy link
Contributor Author

aprokop commented May 11, 2023

Converted to draft to figure out bvh index situation.

@aprokop aprokop marked this pull request as ready for review May 13, 2023 04:10
Copy link
Collaborator

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the places using a ternary won't work once getBoundingVolume can return different types. It might be worth it to fix this here so the code doesn't need to be touched again.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the next step after this PR? Is it immediately enabling different types for the node "bounding volumes"?
I am asking because it looks like half of the code changed here would need to be changed again rightaway.

@aprokop
Copy link
Contributor Author

aprokop commented May 16, 2023

What is the next step after this PR? Is it immediately enabling different types for the node "bounding volumes"?

No, not immediately. I see at least few steps before that:

  1. Converting leaf node to store value, which would be a pair index + bounding volume, and changing getLeafPermutationIndex to getValue
  2. Introducing IndexableGetter internally which would use the value pair
  3. Introducing Range
  4. Changing the interface to allow user-defined indexable getter

The different bounding volume type for leaf and internal volumes can be part of 4, or even later.

Copy link
Collaborator

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

@aprokop aprokop merged commit bf0fb97 into arborx:master May 18, 2023
@aprokop aprokop deleted the bv_functions branch May 18, 2023 00:21
@aprokop aprokop mentioned this pull request May 18, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
performance Something is slower than it should be refactoring Code reorganization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants