-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Separate HappyTreeFunctions for internal and leaf nodes #864
Conversation
GitLab results are pretty good, with minimal slowdowns. I'll double check on AMD CPU and MI250X, and will check DBSCAN and MST results on both V100 and MI250X. Ran DBSCAN on small and large HACC problems on MI250X, no issues. However, there's a bit of slowdown for MST (~5% total, ~8% boruvka). |
HIP benchmark results are unsatisfactory, and are very different from CUDA. |
Comparison against current master or pre-merge of #860 ? |
For benchmarks: Against the current master: GitLab CI shows 17-21% slowdown in Serial knn for Power9 with everything else parity. CUDA no changes, Serial on AMD CPU no changes, HIP no changes. I don't understand why Power9 is slowed down, and am willing to accept it. Against the pre-#860 master, about 5-6% construction/5-8% radius/0% knn in Serial for AMD CPU, 6-14% construction/0% radius/0% knn on V100, 5-6% construction/7-8% radius/-5-10% knn on MI250X. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be using the wrong encoding for the left child and rope now that we are storing them in two independent arrays.
I tried a different scheme where internal nodes are always negative (offset by -1), leaf nodes are nonnegative. The code is indeed simpler, however, there's no performance difference compared to the current master (see gitlab). Moreover, and the real stopping block, is that I am afraid to touch minimum spanning tree. It is a complete mess, where the current scheme is hardcoded so tightly in multiple arrays, keeping track of everything through permuted indices, and the whole thing is so fragile, that I believe it would take days to make any progress on it. I really don't want to spend time doing that for little benefit. If we do want to tackle it, I think we first need to try to switch the order of internal and leaf nodes in the current hierarchy, so that leaf nodes are |
Converted to draft to figure out bvh index situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the places using a ternary won't work once getBoundingVolume
can return different types. It might be worth it to fix this here so the code doesn't need to be touched again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the next step after this PR? Is it immediately enabling different types for the node "bounding volumes"?
I am asking because it looks like half of the code changed here would need to be changed again rightaway.
No, not immediately. I see at least few steps before that:
The different bounding volume type for leaf and internal volumes can be part of 4, or even later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
The goal of this PR is to separate the functions for leaf and internal nodes based on tags.