Skip to content

Faster perplexity computation #2786

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Aug 25, 2023
Merged

Faster perplexity computation #2786

merged 1 commit into from
Aug 25, 2023

Conversation

ikawrakow
Copy link
Contributor

@ikawrakow ikawrakow commented Aug 25, 2023

Time to compute 7B perplexity with 512 context of Wikitext on an RTX-4080 goes from 143 seconds to 128 seconds. I guess, not a big deal for most people, but a nice speedup for someone like me who runs lots of perplexity calculations while trying different quantization techniques.

Also added output of the statistical uncertainty of the computed perplexity. With --ppl-output-type 1 the perplexity tool now outputs 4 columns, which are number of evaluated tokens, perplexity, average negative log probability and its uncertainty.

@ikawrakow ikawrakow merged commit d046dce into master Aug 25, 2023
@ikawrakow ikawrakow deleted the ik/faster_ppl branch August 25, 2023 16:05
mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Aug 26, 2023
* master: (773 commits)
  server : add `/detokenize` endpoint (ggml-org#2802)
  convert.py : advanced option (ggml-org#2753)
  llama : use Unicode Escape Sequence to replace encoded characters (ggml-org#2814)
  flake.nix : add rocm support and cleanup (ggml-org#2808)
  llama : move #includes out of _GNU_SOURCE conditional (ggml-org#2817)
  main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggml-org#1528)
  llama : use std::abs in llama_sample_tail_free (ggml-org#2800)
  k-quants : remove unnecessary tensor shape restrictions (ggml-org#2811)
  Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggml-org#2807)
  Fix HellaSwag (ggml-org#2805)
  flake : build llama.cpp on Intel with nix (ggml-org#2795)
  Handle null rope scaling value (ggml-org#2793)
  Fix spm whitespaces (ggml-org#2806)
  examples : skip unnecessary external lib in server README.md how-to (ggml-org#2804)
  llama : fix struct decl (ggml-org#2790)
  Faster perplexity computation (ggml-org#2786)
  llama : add llama_beam_search() (ggml-org#2267)
  convert.py : Get rope scale from HuggingFace models (ggml-org#2772)
  llama-bench : add model sizes (ggml-org#2771)
  convert.py : export rope freq_base when converting CodeLlama from an HF model (ggml-org#2773)
  ...
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants