--mtest option broken #1414

JohannesGaessler · 2023-05-12T12:31:24Z

Expected Behavior

I use the --mtest option and get a report on how much memory is used.

Current Behavior

The program crashes because the fist token is not BOS.

Environment and Context

As expected, the bug was introduced with f9a6364

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 3700X 8-Core Processor
    CPU family:          23
    Model:               113
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU(s) scaling MHz:  77%
    CPU max MHz:         4935.9370
    CPU min MHz:         2200.0000
    BogoMIPS:            7202.09
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr
                         _opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3
                          fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalign
                         sse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pst
                         ate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsav
                         ec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock
                          nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umi
                         p rdpid overflow_recov succor smca sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    4 MiB (8 instances)
  L3:                    32 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protection
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Operating System, e.g. for Linux:

$ uname -a
Linux johannes-pc 6.3.0-1-MANJARO #1 SMP PREEMPT_DYNAMIC Mon Apr 3 10:46:56 UTC 2023 x86_64 GNU/Linux

SDK version, e.g. for Linux:

Python 3.10.10
GNU Make 4.4.1
g++ (GCC) 12.2.1 20230201

Steps to Reproduce

./main --model ./models/llama-7b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --mtest | tee chat.txt

Failure Logs

/home/johannesg/Projects/llama.cpp [git::f9a6364 *] [johannesg@johannes-pc] [14:25]
> ./main --model ./models/llama-7b-ggml-q4_0.bin --ignore-eos --n_predict 16 --ctx_size 2048 --batch_size 512 --threads 6 --seed 1337 --mtest | tee chat.txt 
WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible.
main: build = 520 (f9a6364)
main: seed  = 1337
llama.cpp: loading model from ./models/llama-7b-ggml-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  68.20 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  = 1024.00 MB

system_info: n_threads = 6 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
llama_eval_internal: first token must be BOS
llama_eval: failed to eval

llama_print_timings:        load time =  1193.45 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token)
llama_print_timings:        eval time =   112.75 ms /     1 runs   (  112.75 ms per run)
llama_print_timings:       total time =  1193.45 ms

The text was updated successfully, but these errors were encountered:

ggerganov closed this as completed in fb62f92 May 12, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--mtest option broken #1414

--mtest option broken #1414

JohannesGaessler commented May 12, 2023 •

edited

Loading

--mtest option broken #1414

--mtest option broken #1414

Comments

JohannesGaessler commented May 12, 2023 • edited Loading

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Failure Logs

JohannesGaessler commented May 12, 2023 •

edited

Loading