Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fix crash on non-AVX systems dynamically loading GGML CPU backends #11780

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

jmorganca
Copy link
Contributor

Thanks for the awesome work by @slaren in #10469 (and a few follow up PRs) to enable dynamic GGML backend loading. This made supporting different CPU instructions in GGML much, much easier.

I noticed a small hitch with with the llamafile code where a machine with a non-AVX CPU would crash when trying to dlopen CPU libraries built with GGML_LLAMAFILE=ON. This moves the AVX-dependent code to do a member variable, fixing the crash on dlopen. I'm not sure how sgemm.cpp is vendored, and so let me know the best way/place to suggest a change.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 10, 2025
Copy link
Member

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed this global. The fix looks ok, but if the code is not inlined it may add some overhead to the other types. I will leave this open for a while in case someone knowledgeable about llamafile/tinyblas wants to propose a better solution.

@slaren slaren merged commit 8a8c4ce into ggml-org:master Feb 13, 2025
46 checks passed
@jmorganca
Copy link
Contributor Author

Thanks for merging @slaren. I'm running some performance tests after noticing ollama/ollama#9087. I'm not sure if this PR is the root cause, but I haven't ruled it out yet. In any case will keep you posted and wanted to give you a heads up in case

@slaren
Copy link
Member

slaren commented Feb 14, 2025

Llamafile tinyblas should only be used for prompt processing, so if you are also observing a decrease of performance during generation, it is not very likely that it was caused by this change.

orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants