Skip to content

Releases: 3Simplex/llama.cpp

b4248

03 Dec 16:27
3b4f2e3
Compare
Choose a tag to compare
llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)

b4164

25 Nov 16:58
9ca2e67
Compare
Choose a tag to compare
server : add speculative decoding support (#10455)

* server : add speculative decoding support

ggml-ci

* server : add helper function slot.can_speculate()

ggml-ci

b4153

22 Nov 14:32
6dfcfef
Compare
Choose a tag to compare
ci: Update oneAPI runtime dll packaging (#10428)

This is the minimum runtime dll dependencies for oneAPI 2025.0

b4145

20 Nov 20:51
9abe9ee
Compare
Choose a tag to compare
vulkan: predicate max operation in soft_max shaders/soft_max (#10437)

Fixes #10434

b4132

19 Nov 15:20
3ee6382
Compare
Choose a tag to compare
cuda : fix CUDA_FLAGS not being applied (#10403)

b4125

18 Nov 17:14
531cb1c
Compare
Choose a tag to compare
Skip searching root path for cross-compile builds (#10383)

b4100

16 Nov 18:29
bcdb7a2
Compare
Choose a tag to compare
server: (web UI) Add samplers sequence customization (#10255)

* Samplers sequence: simplified and input field.

* Removed unused function

* Modify and use `settings-modal-short-input`

* rename "name" --> "label"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

b4067

12 Nov 14:48
54ef9cf
Compare
Choose a tag to compare
vulkan: Throttle the number of shader compiles during the build step.…

b4061

09 Nov 16:16
6423c65
Compare
Choose a tag to compare
metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

b4042

07 Nov 17:15
5107e8c
Compare
Choose a tag to compare
DRY: Fixes clone functionality (#10192)