Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

bugfix: fix cu118 cub usage #410

Merged
merged 1 commit into from
Jul 30, 2024
Merged

bugfix: fix cu118 cub usage #410

merged 1 commit into from
Jul 30, 2024

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Jul 30, 2024

Related issue: sgl-project/sglang#771

This PR fixes the usage of FlagHeads cub API in sampling kernels.
As documented, the default FlagHeads api will always flag the first element, which is not expected when first element is not true.

For thread0, item input[0] is always flagged.

This PR sets the tile_predecessor_item argument (to 0) which will be compared against input[0].

CUDA 12+ don't have this issue because we are using the new SubtractLeft API instead of FlagHeads.

@yzh119 yzh119 merged commit 58d3593 into main Jul 30, 2024
yzh119 added a commit that referenced this pull request Jul 31, 2024
##
[0.1.3](v0.1.2...v0.1.3)
(2024-07-31)

### Bugfix

* bugfix: Fix cudagraph mode of BatchPrefillWithRaggedKVCacheWrapper
([#412](#412))
([9907bc](9907bc1))
* fix cu118 cub usage for sampling kernels
([#410](#410))
([58d359](58d3593))

### Misc

* enhance allocator error info and add shape check for prefill begin
forward functions
([#413](#413))
([5e36c5](5e36c52))
@yzh119 yzh119 deleted the cu118-another-cub-fix branch August 3, 2024 00:20
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant