Skip to content

Releases: 3Simplex/llama.cpp

b3640

28 Aug 16:33
66b039a
Compare
Choose a tag to compare
docker : update CUDA images (#9213)

b3613

21 Aug 17:13
fc54ef0
Compare
Choose a tag to compare
server : support reading arguments from environment variables (#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var

b3576

12 Aug 14:24
84eb2f4
Compare
Choose a tag to compare
docs: introduce gpustack and gguf-parser (#8873)

* readme: introduce gpustack

GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.

Signed-off-by: thxCode <thxcode0824@gmail.com>

* readme: introduce gguf-parser

GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.

Signed-off-by: thxCode <thxcode0824@gmail.com>

---------

Signed-off-by: thxCode <thxcode0824@gmail.com>

b3569

11 Aug 16:41
8cd1bcf
Compare
Choose a tag to compare
flake.lock: Update (#8979)

b3549

08 Aug 13:57
afd27f0
Compare
Choose a tag to compare
scripts : sync cann files (#0)

b3531

06 Aug 14:04
efda90c
Compare
Choose a tag to compare
[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `…

b3504

02 Aug 13:01
e09a800
Compare
Choose a tag to compare
cann: Fix ggml_cann_im2col for 1D im2col (#8819)

* fix ggml_cann_im2col for 1D im2col

* fix build warning

b3501

01 Aug 19:03
b7a08fd
Compare
Choose a tag to compare
Build: Only include execinfo.h on linux systems that support it (#8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

b3494

31 Jul 13:13
268c566
Compare
Choose a tag to compare
nix: cuda: rely on propagatedBuildInputs (#8772)

Listing individual outputs no longer necessary to reduce the runtime closure size after https://github.com/NixOS/nixpkgs/pull/323056.

b3472

27 Jul 14:12
b5e9546
Compare
Choose a tag to compare
llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>