Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Related issue: sgl-project/sglang#771 This PR fixes the usage of `FlagHeads` cub API in sampling kernels. As [documented](https://nvidia.github.io/cccl/cub/api/classcub_1_1BlockDiscontinuity.html), the default FlagHeads api will always flag the first element, which is not expected when first element is not `true`. > For thread0, item input[0] is always flagged. This PR sets the `tile_predecessor_item` argument (to 0) which will be compared against input[0]. CUDA 12+ don't have this issue because we are using the new `SubtractLeft` API instead of `FlagHeads`.
- Loading branch information