New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

bugfix: check gpu id in PyTorch APIs and use input tensor's gpu default stream #361

Merged

yzh119 merged 4 commits into main from check-device

Jul 6, 2024

Collaborator

yzh119 commented Jul 6, 2024

This PR fixes #349 by using the default stream of input tensors' device instead of the default stream of default device (which might be different to input tensors' device). This PR also adds sanity check on input tensors device id (all input tensors must be on the same GPU).

yzh119 added 4 commits

July 6, 2024 08:54

upd

76c6f13

upd

c0f824a


          bugfix

f86dfbc


          bugfix

abdf4d1

yzh119 merged commit 1b84fab into main

yzh119 mentioned this pull request

chore(main): release 0.0.9 #359

Merged

yzh119 added a commit that referenced this pull request


          chore(main): release 0.0.9 (#359)

17a5f1b

🤖 I have created a release *beep* *boop*
---


##
[0.0.9](v0.0.8...v0.0.9)
(2024-07-12)

### Bugfix

* fix the decode kernel segfault in cudagraph mode
([#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21))
- fix decode kernels output for empty kv cache
([#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f))
- check gpu id in PyTorch APIs and use input tensor's gpu default stream
([#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057))

### Performance Improvements

* accelerate alibi
([#365](#365))
([4f0a9f9](4f0a9f9))
* accelerate gqa performance
([#356](#356))
([e56ddad](e56ddad))
* Optimize tensor conversions in C++ code to avoid unnecessary copies
([#366](#366))
([1116237](1116237))

### Acknowledgement

We thank [@Yard1](https://github.com/Yard1),
[@Ying1123](https://github.com/Ying1123) and
[@zhyncs](https://github.com/zhyncs) for their contributions.

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zihao Ye <expye@outlook.com>

yzh119 deleted the check-device branch

July 24, 2024 10:38

# for free to join this conversation on GitHub. Already have an account? # to comment

Labels

None yet