Skip to content

Adjust mul_mat_f16 work memory #1226

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Apr 29, 2023
Merged

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 29, 2023

Haven't tested this yet. The goal is to allocate just the needed amount of memory when not using cuBLAS

@ggerganov ggerganov requested review from slaren and 0cc4m April 29, 2023 09:45
@ggerganov ggerganov force-pushed the adjust-mul-mat-f16-work-memory branch from 638651a to 0ffcd89 Compare April 29, 2023 10:54
@slaren
Copy link
Member

slaren commented Apr 29, 2023

Looks good, I didn't realize that this could increase the maximum size of the work memory, so I set it to the worst case maximum to make testing easier.
In the future we shouldn't need to use any work memory at all for this with cuBLAS, I have been testing converting between f16 and f32 in the GPU, and it is faster that way.

@ggerganov ggerganov merged commit 214b6a3 into master Apr 29, 2023
@ggerganov ggerganov deleted the adjust-mul-mat-f16-work-memory branch April 29, 2023 15:43
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants