Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

top_k_top_p_sampling_from_probs ocassionally output invalid token_ids #384

Closed
hnyls2002 opened this issue Jul 20, 2024 · 15 comments
Closed

Comments

@hnyls2002
Copy link
Contributor

In some dummy input_ids, there are some unexpected sampled results

tensor([     29919,        903,      30140, 1061800448,        903]
@hnyls2002
Copy link
Contributor Author

Even in the dummy input case, the error is hard to reproduce...

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

what's the output of success?

@ispobock
Copy link
Contributor

ispobock commented Jul 20, 2024

@yzh119 @hnyls2002 I got a reproducible example:

import torch
from flashinfer.sampling import top_k_top_p_sampling_from_probs

probs = torch.zeros([1, 32000]).cuda()
probs[0][1] = 1
max_top_k_round, batch_size = 32, probs.shape[0]
top_ks = torch.Tensor([1073741824]).to(torch.int32).cuda()
top_ps = torch.Tensor([1.]).cuda()
for i in range(10):
    uniform_samples = torch.rand((max_top_k_round, batch_size), device=probs.device)
    batch_next_token_ids, _ = top_k_top_p_sampling_from_probs(probs, uniform_samples, top_ks, top_ps)
    print(batch_next_token_ids)

output:

tensor([0], device='cuda:0', dtype=torch.int32)
tensor([1062357376], device='cuda:0', dtype=torch.int32)
tensor([0], device='cuda:0', dtype=torch.int32)
tensor([1056581825], device='cuda:0', dtype=torch.int32)
tensor([1058078609], device='cuda:0', dtype=torch.int32)
tensor([1052083673], device='cuda:0', dtype=torch.int32)
tensor([1065198239], device='cuda:0', dtype=torch.int32)
tensor([1032416633], device='cuda:0', dtype=torch.int32)
tensor([1062065655], device='cuda:0', dtype=torch.int32)
tensor([1058903188], device='cuda:0', dtype=torch.int32)

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

Thank you I'll fix it soon.

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

@ispobock Unfortunately I cannot reproduce your error locally, I always got 1 as the sampling output.

Could you find a fixed random seed (I tried a lot of them) that can reproduce this error?

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

btw, your example is kind of weird because we expect top_ks and top_ps to have shape (batch_size,), but in your example they have shape (1,) which mismatches batch_size (32).

@ispobock
Copy link
Contributor

ispobock commented Jul 20, 2024

@yzh119 The batch_size is assigned by probs.shape[0], and probs.shape[0]=1 in this example.
It may be environment dependent, I can reproduce it with torch==2.3.0+cu118 and flashinfer==0.1.0+cu118torch2.3 on cuda 11.8.

@ispobock
Copy link
Contributor

The issue can not be reproduced when I tried on cuda 12.2 environment with torch==2.3.0+cu121 and 0.1.0+cu121torch2.3.

@yzh119
Copy link
Collaborator

yzh119 commented Jul 20, 2024

Okay that might be the reason, for cu118 we use FlagHeads rather than SubtractLeft because of the cub version: #265

Their semantics might be different, let me double check.

@yzh119
Copy link
Collaborator

yzh119 commented Jul 21, 2024

I think I have figured out the reason, unlike SubtractLeft, the input and output argument in FlagHeads should be different:

    BlockAdjacentDifference<bool, BLOCK_THREADS>(temp_storage->block_prim.adj_diff)
        .FlagHeads<VEC_SIZE>(greater_than_u, greater_than_u, BoolDiffOp());

would result in undefined behavior, we should use different variables:

    BlockAdjacentDifference<bool, BLOCK_THREADS>(temp_storage->block_prim.adj_diff)
        .FlagHeads<VEC_SIZE>(greater_than_u_out, greater_than_u, BoolDiffOp());

After this change, I don't observe weird output anymore.

@hnyls2002
Copy link
Contributor Author

I got this error on cuda 12.4

yzh119 added a commit that referenced this issue Jul 21, 2024
As observed in #384 , we should use different variables for input and
output for `FlagHeads` API in cu118.
@yzh119
Copy link
Collaborator

yzh119 commented Jul 21, 2024

@ispobock The cu118 issue should have been fixed in main branch.

@hnyls2002 can you find a reproducible script?

@ispobock
Copy link
Contributor

@ispobock The cu118 issue should have been fixed in main branch.

@yzh119 It works for me now! Thanks!

@yzh119
Copy link
Collaborator

yzh119 commented Aug 2, 2024

I got this error on cuda 12.4

@hnyls2002 we switched to a deterministic implementation: #417, maybe it can address the issue.

@yzh119
Copy link
Collaborator

yzh119 commented Aug 11, 2024

Lots of potential bugfix PRs have been merged recently. I close this first and feel free to open it again if you still observe such issues.

@yzh119 yzh119 closed this as completed Aug 11, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants