-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Merging tensors of larger models #1
Labels
enhancement
New feature or request
Comments
Thanks! The bigger problem now is that I am out of disk space, haha! |
Leave a tip jar to get a @ggerganov bigger SSD and / or macbook :D |
Its kinda pointless now but I was able to merge the 30B and 65B with this core bit of hackery added to the convert script.
|
Fixed with 007a8f6 On startup, we go through all the parts and merge them dynamically in the |
4 tasks
nemtos
pushed a commit
to nemtos/llama.cpp
that referenced
this issue
Apr 9, 2023
…-instead-of-wget-1 Update command for downloading the weights to use `curl` `curl` is preinstalled on macOS and the new command is equivalent to the `wget` version but avoids having to install `wget`. This should save people some time.
mqy
added a commit
to mqy/llama.cpp
that referenced
this issue
May 26, 2023
mqy
added a commit
to mqy/llama.cpp
that referenced
this issue
May 26, 2023
mqy
added a commit
to mqy/llama.cpp
that referenced
this issue
May 29, 2023
mqy
added a commit
to mqy/llama.cpp
that referenced
this issue
May 31, 2023
broken change: delete original profile ggml-org#1 from q_f32 profiles
mqy
added a commit
to mqy/llama.cpp
that referenced
this issue
Jun 4, 2023
broken change: delete original profile ggml-org#1 from q_f32 profiles
Closed
This was referenced Jul 9, 2023
funnbot
pushed a commit
to funnbot/llama.cpp
that referenced
this issue
Aug 8, 2023
* kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now
Closed
HanClinto
pushed a commit
to HanClinto/llama.cpp
that referenced
this issue
Jun 10, 2024
Nits found in binary renames
Closed
4 tasks
Closed
Oliver-Y
added a commit
to Oliver-Y/llama.cpp
that referenced
this issue
Jul 23, 2024
* a chinese word formed of 3 chinese charcters but the first 2 is not word * tokenizer-fix * E5 Pretokenizer bugfix * whitespace fix * remove extra wpm --------- Co-authored-by: Mike Fan <60965742+mike-fzy@users.noreply.github.com> Co-authored-by: Oliver Ye <OliverY@MacBook-Pro.local>
cunnie
added a commit
to cunnie/llama.cpp
that referenced
this issue
Aug 3, 2024
When `llama-batched-bench` is invoked _without_ setting `-npl`, "number of parallel prompts", it segfaults. The segfault is caused by invoking `max_element()` on a zero-length vector, `n_pl` This commit addresses that by first checking to see if the number of parallel prompts is zero, and if so sets the maximum sequence size to 1; otherwise, sets it to the original, the result of `max_element()`. Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf` ``` * thread ggml-org#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28 69 llama_context_params ctx_params = llama_context_params_from_gpt_params(params); 70 71 // ensure enough sequences are available -> 72 ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end()); ```
ggerganov
added a commit
that referenced
this issue
Aug 4, 2024
* [example] batched-bench "segmentation fault" When `llama-batched-bench` is invoked _without_ setting `-npl`, "number of parallel prompts", it segfaults. The segfault is caused by invoking `max_element()` on a zero-length vector, `n_pl` This commit addresses that by first checking to see if the number of parallel prompts is zero, and if so sets the maximum sequence size to 1; otherwise, sets it to the original, the result of `max_element()`. Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf` ``` * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28 69 llama_context_params ctx_params = llama_context_params_from_gpt_params(params); 70 71 // ensure enough sequences are available -> 72 ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end()); ``` * Update examples/batched-bench/batched-bench.cpp Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>
ggerganov
pushed a commit
that referenced
this issue
Aug 6, 2024
jeroen-mostert
pushed a commit
to jeroen-mostert/llama.cpp
that referenced
this issue
Aug 30, 2024
* fstring ggml-org#1 * fstring ggml-org#2
jeroen-mostert
pushed a commit
to jeroen-mostert/llama.cpp
that referenced
this issue
Aug 30, 2024
* dictionary ggml-org#1 * dictionary ggml-org#2
ykhrustalev
referenced
this issue
in ykhrustalev/llama.cpp
Sep 26, 2024
#1) * Fixed a bug where debug code was included in the release, resulting in an undefined function error. * Change the path of the QNN library when building in termux environment * Revert "Change the path of the QNN library when building in termux environment" This reverts commit c6e26a3. * Changed so that GGML_QNN_DEFAULT_LIB_SEARCH_PATH can be set from command line arguments
5 tasks
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
It shouldn't be hard to merge tensors with my https://github.com/kir-gadjello/zipslicer library, but it's pure Python! If you want to keep the project pure C++ you might want to write a standalone gist script that uses zipslicer to unpack weight shards into binary files.
The text was updated successfully, but these errors were encountered: