Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Data-race in CUDA HA4 Implementation #41

Open
N7Alpha opened this issue Feb 1, 2025 · 2 comments
Open

Data-race in CUDA HA4 Implementation #41

N7Alpha opened this issue Feb 1, 2025 · 2 comments

Comments

@N7Alpha
Copy link

N7Alpha commented Feb 1, 2025

Background

When processing large batches tall images (10000-100000 rows, 64 columns) it exposed a data-race which causes out-of-bounds reads in the labeling_hennequin_2018_HA4.cu algorithm implementation. This only occurs in every ~10-100 runs of the kernel. This implicit sync optimization is the cause. To fix it all you have to do is remove the #ifdef check, that is unconditionally call __syncthreads(). I used compute-sanitizer --racecheck to confirm the existence of this bug. In the data I was working with adding the synchronization lead to a 5% runtime performance regression to the StripLabeling() kernel. My machine was RHEL with an Ampere GPU running with CUDA 11.4. I can't give you example code to reproduce the issue since I'm not allowed to share the code.

Solution

I think the check is supposed to be __CUDA_ARCH__ >= 700 instead of __CUDA_ARCH__ < 700? From what I remember reading Nvidia has been removing implicit synchronization over time and not adding it. I could not name the exact feature the #ifdef block is trying to get at. Personally I would just comment out the #ifdef entirely and leave a comment explaining why. That's what I did in my own code. The specific optimization is mentioned in the original HA4 paper (I think) so adding an explainer at that point should dissuade the user from making the same mistake.

@prittt
Copy link
Owner

prittt commented Feb 13, 2025

@N7Alpha thank you for spotting this issue! Would you like to create a pull request to get appropriate credits for the fix?

@N7Alpha
Copy link
Author

N7Alpha commented Feb 13, 2025

Sure I can do that. Do you happen to know if it’s just inverting the comparison operator? If not I’ll check go through the original paper and figure out what it should be 👍

N7Alpha added a commit to N7Alpha/YACCLAB that referenced this issue Feb 13, 2025
N7Alpha added a commit to N7Alpha/YACCLAB that referenced this issue Feb 13, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants