Skip to content

Multi-thread ggml_cpy() #782

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
ggerganov opened this issue Apr 5, 2023 · 5 comments
Closed

Multi-thread ggml_cpy() #782

ggerganov opened this issue Apr 5, 2023 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers performance Speed related topics

Comments

@ggerganov
Copy link
Member

This is a task suitable for new contributors

See how we multi-threaded the ggml_rope() operator.
Do the same for the ggml_cpy() operator and see if there is any benefit.

Use the ggml profiler (GGML_PERF) to measure the benefit of multi-threaded vs non-multi-threaded ggml_cpy()

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers performance Speed related topics labels Apr 5, 2023
@Fs77X
Copy link

Fs77X commented Apr 6, 2023

Hi, this is my first contribution to a large project so forgive me for being a newbie! I attempted to multithread cpy following the code from the rope commit but I started getting garbage output given an initial prompt. Would appreciate any guidance on what I'm doing wrong! Fs77X@3c8a304

@ggerganov
Copy link
Member Author

So I have updated ggml_cpy() in the latest commit (c3ac702) since I backported some changes from whisper.cpp - we can significantly improve the performance when dst is contiguous. The code is currently quite messy, but I think it can be easily simplified in the future.

While at it, I tried to multi-thread it and didn't observed any measurable improvements, so I guess there is no point in multi-threading it.

I will close this issue now

@slaren
Copy link
Member

slaren commented Apr 17, 2023

@ggerganov Do you still have your multi-threaded implementation of ggml_cpy? I would like to build on that to parallelize quantization in ggml_cpy for LoRA.

@ggerganov
Copy link
Member Author

No, but it was pretty much the same as: https://github.com/ggerganov/llama.cpp/pull/824/files
Just use some shorter variable names like:

    const int ith = params->ith;
    const int nth = params->nth;

    int ir = 0;

@shkim-emily
Copy link

@ggerganov Hello, Does [ggml profiler (GGML_PERF)] not support now?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request good first issue Good for newcomers performance Speed related topics
Projects
None yet
Development

No branches or pull requests

4 participants