Skip to content

ggml : various fixes #1450

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
May 14, 2023
Merged

ggml : various fixes #1450

merged 1 commit into from
May 14, 2023

Conversation

ggerganov
Copy link
Member

The ggml_rope() fixes are irrelevant for LLaMA since n_rot == (n_embd / n_head), but it makes a difference for other models like GPT-J and GPT-NeoX where n_rot < (n_embd / n_head). I'm still not sure if this is the correct implementation, especially for the GPT-NeoX mode, but results kind of seem a bit better than before.

The non-inplace multi-thread ggml_diag_mask_inf() was broken here: #1428 . Again, irrelevant since in LLaMA forward we use ggml_diag_mask_inf_inplace(). Might be relevant to @xaedes

The "scratch buffers" fix might be relevant for LLaMA. See the new ggml_scratch_save() and ggml_scratch_load() functions and their usage in ggml.c: https://github.com/ggerganov/llama.cpp/blob/fixes/ggml.c#LL3925C1-L3939C1
The scratch buffers are mechanism for reusing memory from previous ops when it is no longer needed. The current way of using them is manual and very error-prone. Will hopefully come up with something better in the future.
More info here: ggml-org/whisper.cpp#431

- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant