Releases: 3Simplex/llama.cpp
Releases · 3Simplex/llama.cpp
b4248
llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636)
b4164
server : add speculative decoding support (#10455) * server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci
b4153
ci: Update oneAPI runtime dll packaging (#10428) This is the minimum runtime dll dependencies for oneAPI 2025.0
b4145
vulkan: predicate max operation in soft_max shaders/soft_max (#10437) Fixes #10434
b4132
cuda : fix CUDA_FLAGS not being applied (#10403)
b4125
Skip searching root path for cross-compile builds (#10383)
b4100
server: (web UI) Add samplers sequence customization (#10255) * Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b4067
vulkan: Throttle the number of shader compiles during the build step.…
b4061
metal : reorder write loop in mul mat kernel + style (#10231) * metal : reorder write loop * metal : int -> short, style ggml-ci
b4042
DRY: Fixes clone functionality (#10192)