Skip to content

llama-bench : add model sizes #2771

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 4 commits into from
Aug 25, 2023
Merged

llama-bench : add model sizes #2771

merged 4 commits into from
Aug 25, 2023

Conversation

slaren
Copy link
Member

@slaren slaren commented Aug 24, 2023

Renames llama_model_type API to llama_model_desc, adds llama_model_size and llama_model_n_params APIs to llama.cpp.

Currently, the sizes are always shown in the markdown output. I am ok with that, but if it adds too much clutter, I could make them optional.

Example output with markdown:

Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6

model model_size model_n_params backend n_gpu_layers test t/s
LLaMA 7B mostly Q4_0 3.56 GiB 6.74 B CUDA 99 pp 512 2235.89 ± 34.61
LLaMA 13B mostly Q4_0 6.86 GiB 13.02 B CUDA 99 pp 512 1326.61 ± 100.20
LLaMA 30B mostly Q4_0 17.09 GiB 32.53 B CUDA 99 pp 512 619.07 ± 2.03

build: d0f77b1 (1055)

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change model_size to just size and model_n_params to params
Also GiB -> G to make the table a bit more compact

@SlyEcho
Copy link
Collaborator

SlyEcho commented Aug 25, 2023

Is it possible to add this kind of metadata to GGUF?

Never mind, this is all calculated.

@slaren
Copy link
Member Author

slaren commented Aug 25, 2023

How about something like this? Tried to make it a bit more compact, while still keeping the units.

model size params backend ngl test t/s
LLaMA 7B mostly Q4_0 3.56 GB 6.74 B CUDA 99 pp 512 2239.86 ± 22.42
LLaMA 13B mostly Q4_0 6.86 GB 13.02 B CUDA 99 pp 512 1379.74 ± 2.01
LLaMA 30B mostly Q4_0 17.09 GB 32.53 B CUDA 99 pp 512 614.50 ± 2.52

@ggerganov
Copy link
Member

The size unit is incorrect.

  • 1 Gibibyte is 1073741824 bytes and the shorthand is G or GiB
  • 1 Gigabyte is 1000000000 bytes and the shorthand is GB

We want to report Gibibytes so it's better to use the shorter shorthand G

@slaren
Copy link
Member Author

slaren commented Aug 25, 2023

I just have never seen G used to refer to GiB. I have been looking for references for the usage of G and I couldn't find anything. Anyway, it's just two characters, I have left it as GiB for now, it can be changed later if needed.

@slaren slaren merged commit 154725c into master Aug 25, 2023
@slaren slaren deleted the llama-bench-model-size branch August 25, 2023 13:16
@ggerganov
Copy link
Member

ls -lh reports with G - that's where I picked it up:

$ ls -lh
total 41G
-rw-rw-r-- 1 ggerganov ggerganov  13G Jul 19 15:10 ggml-model-f16.bin
-rw-rw-r-- 1 ggerganov ggerganov  13G Aug 25 14:07 ggml-model-f16.gguf
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Jul 24 16:47 ggml-model-q4_0.bin
-rw-rw-r-- 1 ggerganov ggerganov 3.6G Aug 25 14:08 ggml-model-q4_0.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.0G Aug 16 15:19 ggml-model-q4_1.gguf
-rw-rw-r-- 1 ggerganov ggerganov 4.8G Aug 14 10:56 ggml-model-q5_1.gguf

mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Aug 26, 2023
* master: (773 commits)
  server : add `/detokenize` endpoint (ggml-org#2802)
  convert.py : advanced option (ggml-org#2753)
  llama : use Unicode Escape Sequence to replace encoded characters (ggml-org#2814)
  flake.nix : add rocm support and cleanup (ggml-org#2808)
  llama : move #includes out of _GNU_SOURCE conditional (ggml-org#2817)
  main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggml-org#1528)
  llama : use std::abs in llama_sample_tail_free (ggml-org#2800)
  k-quants : remove unnecessary tensor shape restrictions (ggml-org#2811)
  Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggml-org#2807)
  Fix HellaSwag (ggml-org#2805)
  flake : build llama.cpp on Intel with nix (ggml-org#2795)
  Handle null rope scaling value (ggml-org#2793)
  Fix spm whitespaces (ggml-org#2806)
  examples : skip unnecessary external lib in server README.md how-to (ggml-org#2804)
  llama : fix struct decl (ggml-org#2790)
  Faster perplexity computation (ggml-org#2786)
  llama : add llama_beam_search() (ggml-org#2267)
  convert.py : Get rope scale from HuggingFace models (ggml-org#2772)
  llama-bench : add model sizes (ggml-org#2771)
  convert.py : export rope freq_base when converting CodeLlama from an HF model (ggml-org#2773)
  ...
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* llama-bench : add model sizes

* more compact markdown output

* back to GiB

* adjust column sizes
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants