Faster perplexity computation #2786

ikawrakow · 2023-08-25T15:15:13Z

Time to compute 7B perplexity with 512 context of Wikitext on an RTX-4080 goes from 143 seconds to 128 seconds. I guess, not a big deal for most people, but a nice speedup for someone like me who runs lots of perplexity calculations while trying different quantization techniques.

Also added output of the statistical uncertainty of the computed perplexity. With --ppl-output-type 1 the perplexity tool now outputs 4 columns, which are number of evaluated tokens, perplexity, average negative log probability and its uncertainty.

* master: (773 commits) server : add `/detokenize` endpoint (ggml-org#2802) convert.py : advanced option (ggml-org#2753) llama : use Unicode Escape Sequence to replace encoded characters (ggml-org#2814) flake.nix : add rocm support and cleanup (ggml-org#2808) llama : move #includes out of _GNU_SOURCE conditional (ggml-org#2817) main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggml-org#1528) llama : use std::abs in llama_sample_tail_free (ggml-org#2800) k-quants : remove unnecessary tensor shape restrictions (ggml-org#2811) Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggml-org#2807) Fix HellaSwag (ggml-org#2805) flake : build llama.cpp on Intel with nix (ggml-org#2795) Handle null rope scaling value (ggml-org#2793) Fix spm whitespaces (ggml-org#2806) examples : skip unnecessary external lib in server README.md how-to (ggml-org#2804) llama : fix struct decl (ggml-org#2790) Faster perplexity computation (ggml-org#2786) llama : add llama_beam_search() (ggml-org#2267) convert.py : Get rope scale from HuggingFace models (ggml-org#2772) llama-bench : add model sizes (ggml-org#2771) convert.py : export rope freq_base when converting CodeLlama from an HF model (ggml-org#2773) ...

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Faster perplexity computation

ce45974

ggerganov approved these changes Aug 25, 2023

View reviewed changes

ikawrakow merged commit d046dce into master Aug 25, 2023

ikawrakow deleted the ik/faster_ppl branch August 25, 2023 16:05

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023

Faster perplexity computation (ggml-org#2786)

8d03a54

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster perplexity computation #2786

Faster perplexity computation #2786

Uh oh!

ikawrakow commented Aug 25, 2023 •

edited

Loading

Uh oh!

Uh oh!

Faster perplexity computation #2786

Faster perplexity computation #2786

Uh oh!

Conversation

ikawrakow commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ikawrakow commented Aug 25, 2023 •

edited

Loading