Data-race in CUDA HA4 Implementation #41

N7Alpha · 2025-02-01T20:13:03Z

Background

When processing large batches tall images (10000-100000 rows, 64 columns) it exposed a data-race which causes out-of-bounds reads in the labeling_hennequin_2018_HA4.cu algorithm implementation. This only occurs in every ~10-100 runs of the kernel. This implicit sync optimization is the cause. To fix it all you have to do is remove the #ifdef check, that is unconditionally call __syncthreads(). I used compute-sanitizer --racecheck to confirm the existence of this bug. In the data I was working with adding the synchronization lead to a 5% runtime performance regression to the StripLabeling() kernel. My machine was RHEL with an Ampere GPU running with CUDA 11.4. I can't give you example code to reproduce the issue since I'm not allowed to share the code.

Solution

I think the check is supposed to be __CUDA_ARCH__ >= 700 instead of __CUDA_ARCH__ < 700? From what I remember reading Nvidia has been removing implicit synchronization over time and not adding it. I could not name the exact feature the #ifdef block is trying to get at. Personally I would just comment out the #ifdef entirely and leave a comment explaining why. That's what I did in my own code. The specific optimization is mentioned in the original HA4 paper (I think) so adding an explainer at that point should dissuade the user from making the same mistake.

The text was updated successfully, but these errors were encountered:

prittt · 2025-02-13T16:37:14Z

@N7Alpha thank you for spotting this issue! Would you like to create a pull request to get appropriate credits for the fix?

N7Alpha · 2025-02-13T16:43:59Z

Sure I can do that. Do you happen to know if it’s just inverting the comparison operator? If not I’ll check go through the original paper and figure out what it should be 👍

N7Alpha added a commit to N7Alpha/YACCLAB that referenced this issue Feb 13, 2025

Fix prittt#41

9e39c3f

N7Alpha added a commit to N7Alpha/YACCLAB that referenced this issue Feb 13, 2025

Fix prittt#41

e6ea799

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-race in CUDA HA4 Implementation #41

Data-race in CUDA HA4 Implementation #41

N7Alpha commented Feb 1, 2025 •

edited

Loading

prittt commented Feb 13, 2025

N7Alpha commented Feb 13, 2025

Data-race in CUDA HA4 Implementation #41

Data-race in CUDA HA4 Implementation #41

Comments

N7Alpha commented Feb 1, 2025 • edited Loading

Background

Solution

prittt commented Feb 13, 2025

N7Alpha commented Feb 13, 2025

N7Alpha commented Feb 1, 2025 •

edited

Loading