Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

JamePeng · 2025-03-09T01:57:40Z

Update llama.cpp version llama.cpp updated [from 794fe2 to f08f4b3]
Use the llama_sampler_init instead of llama_sampler() for safe usage
Sync llama : add Phi-4-mini support
Sync llama : expose llama_model_n_head_kv in the API
Sync tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars
class LlamaSampler: append add_xtc(), add_top_n_sigma() and add_dry()
Remove Tail-Free sampling
Add TopN-Sigma/XTC/DRY samplers code into sampler
Sync llama : Add Gemma 3 support

JamePeng · 2025-03-09T20:51:59Z

I tried to adjust the workflow output based on VS2022 to compile pip wheels, and generate two cuda versions 12.4.1 and 12.6.3 and the win version of py310-312 for your convenience.
It should have been compiled now: https://github.com/JamePeng/llama-cpp-python/releases

JamePeng · 2025-03-13T13:16:24Z

llama.cpp : refactor llama_context, llama_kv_cache, llm_build_context (ggml-org/llama.cpp#12181)
They change API name again, :<

JamePeng · 2025-03-13T14:18:07Z

The adjusted code is moved to https://github.com/JamePeng/llama-cpp-python/tree/1966-branch

JamePeng changed the title ~~Sync LLAMA_API names with ggml-org/llama.cpp 20250309, support LLAMA_VOCAB_PRE_TYPE_GPT4O~~ Sync LLAMA_API names with ggml-org/llama.cpp 20250309 Mar 9, 2025

JamePeng mentioned this pull request Mar 9, 2025

GPU Support Missing in Version >=0.3.5 on Windows with CUDA 12.4 and RTX 3090 #1967

Open

JamePeng force-pushed the main branch from d2dd3b0 to 7074f42 Compare March 9, 2025 16:50

JamePeng changed the title ~~Sync LLAMA_API names with ggml-org/llama.cpp 20250309~~ Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows Mar 9, 2025

JamePeng closed this Mar 13, 2025

JamePeng force-pushed the main branch from 00bcbcc to 37eb5f0 Compare March 13, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

JamePeng commented Mar 9, 2025 •

edited

Loading

JamePeng commented Mar 9, 2025

JamePeng commented Mar 13, 2025 •

edited

Loading

JamePeng commented Mar 13, 2025

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Synchronize LLAMA_API with ggml-org/llama.cpp and update cuda workflow for windows #1966

Conversation

JamePeng commented Mar 9, 2025 • edited Loading

JamePeng commented Mar 9, 2025

JamePeng commented Mar 13, 2025 • edited Loading

JamePeng commented Mar 13, 2025

JamePeng commented Mar 9, 2025 •

edited

Loading

JamePeng commented Mar 13, 2025 •

edited

Loading