Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

refactor: reduce the binary size of batch decode kernels #343

Merged
merged 2 commits into from
Jun 30, 2024

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Jun 30, 2024

This PR refactors the batch decode related kernels, and make the following breaking changes:

  1. remove batch_decode_with_padded_kv_cache operator, we encourage user to use BatchDecodeWithPagedKVCacheWrapper.
  2. Delete redundant DTypeQ * DTypeKV combinations, now we only support the following cases:
  3. DTypeQ == DTypeKV
  4. DTypeQ is a float16 and DTypeKV is a float8

The output data type follows the query data type.

@yzh119 yzh119 merged commit 0d333ff into main Jun 30, 2024
@yzh119 yzh119 deleted the reduce-decode-datatype branch June 30, 2024 07:14
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant