Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: decouple float and int workspace buffer #442

Merged
merged 3 commits into from
Aug 13, 2024
Merged

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Aug 13, 2024

Before this PR, flashinfer coupled float and int buffers in a single workspace buffer, and different wrappers cannot share the same buffers.

This PR decouples float and int workspace buffer. The float workspace buffer (large) can be shared in multiple wrappers, and the int buffer (small) is unique for each wrapper. This PR can save GPU memory when multiple wrappers are created (decode, prefill paged, prefill ragged) or cascade inference.

@yzh119 yzh119 merged commit a7ee566 into main Aug 13, 2024
yzh119 added a commit that referenced this pull request Aug 13, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.1.5](v0.1.4...v0.1.5)
(2024-08-13)


### Bugfix

* Fix PagedPrefill python api and some typos
([#441](#441))
([3fff008](3fff008))
* fix prefill kernels' lse result for empty kv-cache
([#440](#440))
([6ac28f4](6ac28f4))

### Features

* decouple float and int workspace buffer
([#442](#442))
([a7ee566](a7ee566))


### Performance Improvements

* faster fp8->fp16 dequantization for pre sm_90 arch
([#439](#439))
([c93f647](c93f647))

### Acknowledgement

We thank contributions and feedbacks from the community:
[@comaniac](https://github.com/comaniac),
[@hnyls2002](https://github.com/hnyls2002),
[@jianfei-wangg](https://github.com/jianfei-wangg),
[@Yard1](https://github.com/Yard1).


---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zihao Ye <expye@outlook.com>
@yzh119 yzh119 deleted the separate-int-float-buffer branch August 23, 2024 21:22
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant