Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

perf: Optimize tensor conversions in C++ code to avoid unnecessary copies #366

Merged

Conversation

Yard1
Copy link
Contributor

@Yard1 Yard1 commented Jul 10, 2024

Small tweak to avoid unnecessary copying by combining to calls. Discovered during profiling.

Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yzh119 yzh119 changed the title Optimize tensor conversions in C++ code to avoid unnecessary copies perf: Optimize tensor conversions in C++ code to avoid unnecessary copies Jul 10, 2024
@yzh119 yzh119 merged commit 1116237 into flashinfer-ai:main Jul 10, 2024
@Yard1 Yard1 deleted the optimize_libtorch_tensor_conversions branch July 10, 2024 23:44
yzh119 added a commit that referenced this pull request Jul 12, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.0.9](v0.0.8...v0.0.9)
(2024-07-12)

### Bugfix

* fix the decode kernel segfault in cudagraph mode
([#368](https://github.com/flashinfer-ai/flashinfer/pull/368))([c69cfa](https://github.com/flashinfer-ai/flashinfer/commit/c69cfabc540e4a7edd991713df10d575ff3b0c21))
- fix decode kernels output for empty kv cache
([#363](https://github.com/flashinfer-ai/flashinfer/pull/363))([ac72b1](https://github.com/flashinfer-ai/flashinfer/commit/ac72b1cc14a6474d601f371c8d69e2600ac28d2f))
- check gpu id in PyTorch APIs and use input tensor's gpu default stream
([#361](https://github.com/flashinfer-ai/flashinfer/pull/361))([1b84fa](https://github.com/flashinfer-ai/flashinfer/commit/1b84fab3e4f53fb4fa26952fdb46fa8018634057))

### Performance Improvements

* accelerate alibi
([#365](#365))
([4f0a9f9](4f0a9f9))
* accelerate gqa performance
([#356](#356))
([e56ddad](e56ddad))
* Optimize tensor conversions in C++ code to avoid unnecessary copies
([#366](#366))
([1116237](1116237))

### Acknowledgement

We thank [@Yard1](https://github.com/Yard1),
[@Ying1123](https://github.com/Ying1123) and
[@zhyncs](https://github.com/zhyncs) for their contributions.

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zihao Ye <expye@outlook.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants