Releases · 3Simplex/llama.cpp

28 Aug 16:33

66b039a

b3640

docker : update CUDA images (#9213)

Assets 19

21 Aug 17:13

github-actions

b3613

fc54ef0

b3613

server : support reading arguments from environment variables (#9105)

* server : support reading arguments from environment variables

* add -fa and -dt

* readme : specify non-arg env var

Assets 19

12 Aug 14:24

github-actions

b3576

84eb2f4

b3576

docs: introduce gpustack and gguf-parser (#8873)

* readme: introduce gpustack

GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.

Signed-off-by: thxCode <thxcode0824@gmail.com>

* readme: introduce gguf-parser

GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.

Signed-off-by: thxCode <thxcode0824@gmail.com>

---------

Signed-off-by: thxCode <thxcode0824@gmail.com>

Assets 20

11 Aug 16:41

github-actions

b3569

8cd1bcf

b3569

flake.lock: Update (#8979)

Assets 20

08 Aug 13:57

github-actions

b3549

afd27f0

b3549

scripts : sync cann files (#0)

Assets 20

06 Aug 14:04

github-actions

b3531

efda90c

b3531

[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `…

Assets 20

02 Aug 13:01

github-actions

b3504

e09a800

b3504

cann: Fix ggml_cann_im2col for 1D im2col (#8819)

* fix ggml_cann_im2col for 1D im2col

* fix build warning

Assets 20

01 Aug 19:03

github-actions

b3501

b7a08fd

b3501

Build: Only include execinfo.h on linux systems that support it (#8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

Assets 20

31 Jul 13:13

github-actions

b3494

268c566

b3494

nix: cuda: rely on propagatedBuildInputs (#8772)

Listing individual outputs no longer necessary to reduce the runtime closure size after https://github.com/NixOS/nixpkgs/pull/323056.

Assets 20

27 Jul 14:12

github-actions

b3472

b5e9546

b3472

llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: 3Simplex/llama.cpp

b3640

b3613

b3576

b3569

b3549

b3531

b3504

b3501

b3494

b3472