Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Qwen-7B-Q4_0 works well on Mac M1, but Qwen-7B-Q8_0 cannot work with a ggml-metal error. #42

Open
songkq opened this issue Nov 16, 2023 · 1 comment

Comments

@songkq
Copy link

songkq commented Nov 16, 2023

@simonJJJ Hi, could you please give some advice for this issue? Qwen-7B-Q4_0 works well on Mac M1, but Qwen-7B-Q8_0 cannot.

cmake -B build -DGGML_METAL=ON && cmake --build build -j


./main -m ../../ggml_bins/qwen7b-chat-8k-ggml-q4_0.bin --tiktoken ../../assets/qwen.tiktoken -v -p 介绍下三国演义
system info: | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
inference config: | max_length = 2048 | max_context_length = 512 | top_k = 0 | top_p = 0.5 | temperature = 0.95 | num_threads = 0 |
loaded qwen model from ../../ggml_bins/qwen7b-chat-8k-ggml-q4_0.bin within: 88.669 ms

《三国演义》是中国古代四大名著之一,由罗贯中创作。它讲述了从东汉末年到西晋初年之间,中国历史上著名的三国时期的故事。三国时期是中国历史上一个非常重要的时期,它涉及到政治、军事、文化、经济等多个方面,也出现了许多著名的英雄人物,如曹操、刘备、孙权等。《三国演义》以三国时期的历史事件为基础,通过一系列精彩的故事,描述了当时的政治、军事、文化、经济等方面的情况,也展示了当时人们的思想、情感和行为。

prompt time: 5496.2 ms / 24 tokens (229.008 ms/token)
output time: 3756.11 ms / 117 tokens (32.103 ms/token)
total time: 9252.31 ms



./main -m ../../ggml_bins/qwen7b-chat-8k-ggml-q8_0.bin --tiktoken ../../assets/qwen.tiktoken -v -p 介绍下三国演义
system info: | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
inference config: | max_length = 2048 | max_context_length = 512 | top_k = 0 | top_p = 0.5 | temperature = 0.95 | num_threads = 0 |
loaded qwen model from ../../ggml_bins/qwen7b-chat-8k-ggml-q8_0.bin within: 87.001 ms

GGML_ASSERT: /workspace/qwen.cpp/third_party/ggml/src/ggml-metal.m:1453: false
[1]    12416 abort      ./main -m ../../ggml_bins/qwen7b-chat-8k-ggml-q8_0.bin --tiktoken  -v -p
@fann1993814
Copy link

Hi, @songkq , 也許你可以嘗試看看我的PR #41 ,裡面有一些實驗數據。
/workspace/qwen.cpp/third_party/ggml/src/ggml-metal.m:1453: false 應該是觸發 OOM

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants