llama-bench : add model sizes #2771

slaren · 2023-08-24T20:08:04Z

Renames llama_model_type API to llama_model_desc, adds llama_model_size and llama_model_n_params APIs to llama.cpp.

Currently, the sizes are always shown in the markdown output. I am ok with that, but if it adds too much clutter, I could make them optional.

Example output with markdown:

Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6

model	model_size	model_n_params	backend	n_gpu_layers	test	t/s
LLaMA 7B mostly Q4_0	3.56 GiB	6.74 B	CUDA	99	pp 512	2235.89 ± 34.61
LLaMA 13B mostly Q4_0	6.86 GiB	13.02 B	CUDA	99	pp 512	1326.61 ± 100.20
LLaMA 30B mostly Q4_0	17.09 GiB	32.53 B	CUDA	99	pp 512	619.07 ± 2.03

build: d0f77b1 (1055)

ggerganov

Maybe change model_size to just size and model_n_params to params
Also GiB -> G to make the table a bit more compact

SlyEcho · 2023-08-25T11:46:46Z

~~Is it possible to add this kind of metadata to GGUF?~~

Never mind, this is all calculated.

slaren · 2023-08-25T12:26:31Z

How about something like this? Tried to make it a bit more compact, while still keeping the units.

model	size	params	backend	ngl	test	t/s
LLaMA 7B mostly Q4_0	3.56 GB	6.74 B	CUDA	99	pp 512	2239.86 ± 22.42
LLaMA 13B mostly Q4_0	6.86 GB	13.02 B	CUDA	99	pp 512	1379.74 ± 2.01
LLaMA 30B mostly Q4_0	17.09 GB	32.53 B	CUDA	99	pp 512	614.50 ± 2.52

ggerganov · 2023-08-25T12:58:36Z

The size unit is incorrect.

1 Gibibyte is 1073741824 bytes and the shorthand is G or GiB
1 Gigabyte is 1000000000 bytes and the shorthand is GB

We want to report Gibibytes so it's better to use the shorter shorthand G

slaren · 2023-08-25T13:15:50Z

I just have never seen G used to refer to GiB. I have been looking for references for the usage of G and I couldn't find anything. Anyway, it's just two characters, I have left it as GiB for now, it can be changed later if needed.

ggerganov · 2023-08-25T13:19:45Z

ls -lh reports with G - that's where I picked it up:

$ ls -lh
total 41G
-rw-rw-r-- 1 ggerganov ggerganov  13G Jul 19 15:10 ggml-model-f16.bin
-rw-rw-r-- 1 ggerganov ggerganov  13G Aug 25 14:07 ggml-model-f16.gguf
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Jul 24 16:47 ggml-model-q4_0.bin
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Aug 25 14:08 ggml-model-q4_0.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.0G Aug 16 15:19 ggml-model-q4_1.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.8G Aug 14 10:56 ggml-model-q5_1.gguf

* master: (773 commits) server : add `/detokenize` endpoint (ggml-org#2802) convert.py : advanced option (ggml-org#2753) llama : use Unicode Escape Sequence to replace encoded characters (ggml-org#2814) flake.nix : add rocm support and cleanup (ggml-org#2808) llama : move #includes out of _GNU_SOURCE conditional (ggml-org#2817) main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggml-org#1528) llama : use std::abs in llama_sample_tail_free (ggml-org#2800) k-quants : remove unnecessary tensor shape restrictions (ggml-org#2811) Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggml-org#2807) Fix HellaSwag (ggml-org#2805) flake : build llama.cpp on Intel with nix (ggml-org#2795) Handle null rope scaling value (ggml-org#2793) Fix spm whitespaces (ggml-org#2806) examples : skip unnecessary external lib in server README.md how-to (ggml-org#2804) llama : fix struct decl (ggml-org#2790) Faster perplexity computation (ggml-org#2786) llama : add llama_beam_search() (ggml-org#2267) convert.py : Get rope scale from HuggingFace models (ggml-org#2772) llama-bench : add model sizes (ggml-org#2771) convert.py : export rope freq_base when converting CodeLlama from an HF model (ggml-org#2773) ...

* llama-bench : add model sizes * more compact markdown output * back to GiB * adjust column sizes

llama-bench : add model sizes

53755ed

ggerganov approved these changes Aug 25, 2023

View reviewed changes

more compact markdown output

bc0dc16

slaren added 2 commits August 25, 2023 15:11

back to GiB

cc544b2

adjust column sizes

3247687

slaren merged commit 154725c into master Aug 25, 2023

slaren deleted the llama-bench-model-size branch August 25, 2023 13:16

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023

llama-bench : add model sizes (ggml-org#2771)

5274318

* llama-bench : add model sizes * more compact markdown output * back to GiB * adjust column sizes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-bench : add model sizes #2771

llama-bench : add model sizes #2771

Uh oh!

slaren commented Aug 24, 2023

Uh oh!

ggerganov left a comment

Uh oh!

SlyEcho commented Aug 25, 2023 •

edited

Loading

Uh oh!

slaren commented Aug 25, 2023

Uh oh!

ggerganov commented Aug 25, 2023

Uh oh!

slaren commented Aug 25, 2023

Uh oh!

ggerganov commented Aug 25, 2023

Uh oh!

Uh oh!

llama-bench : add model sizes #2771

llama-bench : add model sizes #2771

Uh oh!

Conversation

slaren commented Aug 24, 2023

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

SlyEcho commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Aug 25, 2023

Uh oh!

ggerganov commented Aug 25, 2023

Uh oh!

slaren commented Aug 25, 2023

Uh oh!

ggerganov commented Aug 25, 2023

Uh oh!

Uh oh!

SlyEcho commented Aug 25, 2023 •

edited

Loading