Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Performance] Increase num_frags_z for GPUs with larger shared memory per SM #63

Merged
merged 1 commit into from
Jan 10, 2024

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Jan 10, 2024

In our previous design num_frags_z is fixed to 2 which is not optimal for GPUs with large shared memory per SM.

@yzh119 yzh119 merged commit 8320ebe into main Jan 10, 2024
@MasterJH5574 MasterJH5574 deleted the accelerate-batch-prefill branch January 13, 2024 04:38
yzh119 added a commit that referenced this pull request Aug 13, 2024
Followup of #439 , use `constexpr` in if conditions so that
`BIAS_OFFSET` won't exceed 32 at compile time.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant