Multi-thread ggml_cpy() #782

ggerganov · 2023-04-05T16:24:00Z

This is a task suitable for new contributors

See how we multi-threaded the ggml_rope() operator.
Do the same for the ggml_cpy() operator and see if there is any benefit.

Use the ggml profiler (GGML_PERF) to measure the benefit of multi-threaded vs non-multi-threaded ggml_cpy()

The text was updated successfully, but these errors were encountered:

Fs77X · 2023-04-06T22:27:15Z

Hi, this is my first contribution to a large project so forgive me for being a newbie! I attempted to multithread cpy following the code from the rope commit but I started getting garbage output given an initial prompt. Would appreciate any guidance on what I'm doing wrong! Fs77X@3c8a304

ggerganov · 2023-04-10T19:47:15Z

So I have updated ggml_cpy() in the latest commit (c3ac702) since I backported some changes from whisper.cpp - we can significantly improve the performance when dst is contiguous. The code is currently quite messy, but I think it can be easily simplified in the future.

While at it, I tried to multi-thread it and didn't observed any measurable improvements, so I guess there is no point in multi-threading it.

I will close this issue now

slaren · 2023-04-17T20:23:23Z

@ggerganov Do you still have your multi-threaded implementation of ggml_cpy? I would like to build on that to parallelize quantization in ggml_cpy for LoRA.

ggerganov · 2023-04-17T20:31:15Z

No, but it was pretty much the same as: https://github.com/ggerganov/llama.cpp/pull/824/files
Just use some shorter variable names like:

    const int ith = params->ith;
    const int nth = params->nth;

    int ir = 0;

shkim-emily · 2024-11-01T06:52:51Z

@ggerganov Hello, Does [ggml profiler (GGML_PERF)] not support now?

ggerganov added enhancement good first issue performance labels Apr 5, 2023

ironman5366 mentioned this issue Apr 7, 2023

Multi-thread ggml_cpy() #824

Closed

ggerganov closed this as completed Apr 10, 2023

slaren mentioned this issue Apr 15, 2023

Add LoRA support #820

Merged

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-thread ggml_cpy() #782

Multi-thread ggml_cpy() #782

ggerganov commented Apr 5, 2023

Fs77X commented Apr 6, 2023 •

edited

Loading

ggerganov commented Apr 10, 2023

slaren commented Apr 17, 2023

ggerganov commented Apr 17, 2023

shkim-emily commented Nov 1, 2024

Multi-thread ggml_cpy() #782

Multi-thread ggml_cpy() #782

Comments

ggerganov commented Apr 5, 2023

Fs77X commented Apr 6, 2023 • edited Loading

ggerganov commented Apr 10, 2023

slaren commented Apr 17, 2023

ggerganov commented Apr 17, 2023

shkim-emily commented Nov 1, 2024

Fs77X commented Apr 6, 2023 •

edited

Loading