-
Notifications
You must be signed in to change notification settings - Fork 11.5k
llama-bench : add model sizes #2771
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change model_size
to just size
and model_n_params
to params
Also GiB
-> G
to make the table a bit more compact
Never mind, this is all calculated. |
How about something like this? Tried to make it a bit more compact, while still keeping the units.
|
The size unit is incorrect.
We want to report Gibibytes so it's better to use the shorter shorthand |
I just have never seen |
|
* master: (773 commits) server : add `/detokenize` endpoint (ggml-org#2802) convert.py : advanced option (ggml-org#2753) llama : use Unicode Escape Sequence to replace encoded characters (ggml-org#2814) flake.nix : add rocm support and cleanup (ggml-org#2808) llama : move #includes out of _GNU_SOURCE conditional (ggml-org#2817) main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggml-org#1528) llama : use std::abs in llama_sample_tail_free (ggml-org#2800) k-quants : remove unnecessary tensor shape restrictions (ggml-org#2811) Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggml-org#2807) Fix HellaSwag (ggml-org#2805) flake : build llama.cpp on Intel with nix (ggml-org#2795) Handle null rope scaling value (ggml-org#2793) Fix spm whitespaces (ggml-org#2806) examples : skip unnecessary external lib in server README.md how-to (ggml-org#2804) llama : fix struct decl (ggml-org#2790) Faster perplexity computation (ggml-org#2786) llama : add llama_beam_search() (ggml-org#2267) convert.py : Get rope scale from HuggingFace models (ggml-org#2772) llama-bench : add model sizes (ggml-org#2771) convert.py : export rope freq_base when converting CodeLlama from an HF model (ggml-org#2773) ...
* llama-bench : add model sizes * more compact markdown output * back to GiB * adjust column sizes
Renames
llama_model_type
API tollama_model_desc
, addsllama_model_size
andllama_model_n_params
APIs to llama.cpp.Currently, the sizes are always shown in the markdown output. I am ok with that, but if it adds too much clutter, I could make them optional.
Example output with markdown:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6
build: d0f77b1 (1055)