opt(ggml): No need to allocate memory for `ggml_new_tensor_impl` for mmaped case #916

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

Closed

jon-chuang opened this issue Apr 12, 2023 · 1 comment

Contributor

jon-chuang commented Apr 12, 2023

The solution is simple. As per llama.cpp (https://github.com/ggerganov/llama.cpp/blob/f76cb3a34d6a6b03afb96650e39495f201eac042/llama.cpp#L933), set ctx.no_alloc to true.

The text was updated successfully, but these errors were encountered:

Contributor Author

jon-chuang commented Apr 12, 2023

Oops, wrong repo

jon-chuang closed this as completed

Bearsaerker mentioned this issue

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

# for free to join this conversation on GitHub. Already have an account? # to comment