Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GpuShuffleCoalesceIterator acquire semaphore after host concat #4396

Merged
merged 3 commits into from
Dec 21, 2021

Conversation

abellina
Copy link
Collaborator

Closes #4395

This is a small optimization that was spotted while looking into NDS Q64 traces. With the change, Q64 can save up to 3 seconds (though it changes quite a bit from run to run). When executing this over all of NDS, it saved ~1 minute for the whole run chipping away at times a few hundred ms, up to 5 seconds for q94.

I saw some queries being slower, with the worst case being q42 (which for 1 sample out of 10 was 2x slower). I have not been able to reproduce this case, with all subsequent runs at 1x or above. This was a 3.8 second in the last weekly run, with the patch it's hovering between 3.6 and 4.5.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina added the performance A performance related task/issue label Dec 20, 2021
@jlowe
Copy link
Contributor

jlowe commented Dec 20, 2021

build

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 27cc725 into NVIDIA:branch-22.02 Dec 21, 2021
@abellina abellina deleted the perf/join_sem_tweaks branch December 21, 2021 18:48
abellina added a commit to abellina/spark-rapids that referenced this pull request Dec 30, 2021
…A#4396)

* GpuShuffleCoalesceIterator acquire semaphore after host concat

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

* Add semaphore acquire for batches without columns
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator
2 participants