fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

jmorganca · 2025-02-10T02:16:51Z

Thanks for the awesome work by @slaren in #10469 (and a few follow up PRs) to enable dynamic GGML backend loading. This made supporting different CPU instructions in GGML much, much easier.

I noticed a small hitch with with the llamafile code where a machine with a non-AVX CPU would crash when trying to dlopen CPU libraries built with GGML_LLAMAFILE=ON. This moves the AVX-dependent code to do a member variable, fixing the crash on dlopen. I'm not sure how sgemm.cpp is vendored, and so let me know the best way/place to suggest a change.

slaren

Thanks, I missed this global. The fix looks ok, but if the code is not inlined it may add some overhead to the other types. I will leave this open for a while in case someone knowledgeable about llamafile/tinyblas wants to propose a better solution.

jmorganca · 2025-02-14T18:16:46Z

Thanks for merging @slaren. I'm running some performance tests after noticing ollama/ollama#9087. I'm not sure if this PR is the root cause, but I haven't ruled it out yet. In any case will keep you posted and wanted to give you a heads up in case

slaren · 2025-02-14T18:32:57Z

Llamafile tinyblas should only be used for prompt processing, so if you are also observing a decrease of performance during generation, it is not very likely that it was caused by this change.

…rg#11780)

llamafile: use member variable instead of constant for iq4nlt

f3ee51e

github-actions bot added the ggml label Feb 10, 2025

slaren approved these changes Feb 10, 2025

View reviewed changes

slaren merged commit 8a8c4ce into ggml-org:master Feb 13, 2025
46 checks passed

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025

llamafile: use member variable instead of constant for iq4nlt (ggml-o…

ef0dbde

…rg#11780)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

llamafile: use member variable instead of constant for iq4nlt (ggml-o…

a32b415

…rg#11780)

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

llamafile: use member variable instead of constant for iq4nlt (ggml-o…

87555c4

…rg#11780)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

jmorganca commented Feb 10, 2025

slaren left a comment

jmorganca commented Feb 14, 2025

slaren commented Feb 14, 2025

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

Conversation

jmorganca commented Feb 10, 2025

slaren left a comment

Choose a reason for hiding this comment

jmorganca commented Feb 14, 2025

slaren commented Feb 14, 2025