Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

Closed
jfduma opened this issue Nov 20, 2024 · 1 comment
Closed

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

jfduma opened this issue Nov 20, 2024 · 1 comment
Labels
duplicate This issue or pull request already exists

Comments

@jfduma
Copy link

jfduma commented Nov 20, 2024

开发机:ubuntu 20.04 mnn 3.0.0

模型 huggingface:Qwen2.5-0.5B-Instruct 和 Qwen2.5-0.5B-Instruct-GPTQ-Int8

导出 onnx 模型

$ python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export onnx --dst_path mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3

✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s]
⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s]
✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s]
✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s]
✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s]
✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]

导出 mnn 模型

$ mnn/build/MNNConvert --modelFile mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn --weightQuantBits 8 --weightQuantBlock 128 --weightQuantAsymmetric --saveExternalData --transformerFuse --allowCustomOp

The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 3
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
Fuse Attention as /Reshape_8_output_0
Fuse Attention as /Reshape_17_output_0
Fuse Attention as /Reshape_26_output_0
Fuse Attention as /Reshape_35_output_0
Fuse Attention as /Reshape_44_output_0
Fuse Attention as /Reshape_53_output_0
Fuse Attention as /Reshape_62_output_0
Fuse Attention as /Reshape_71_output_0
Fuse Attention as /Reshape_80_output_0
Fuse Attention as /Reshape_89_output_0
Fuse Attention as /Reshape_98_output_0
Fuse Attention as /Reshape_107_output_0
Fuse Attention as /Reshape_116_output_0
Fuse Attention as /Reshape_125_output_0
Fuse Attention as /Reshape_134_output_0
Fuse Attention as /Reshape_143_output_0
Fuse Attention as /Reshape_152_output_0
Fuse Attention as /Reshape_161_output_0
Fuse Attention as /Reshape_170_output_0
Fuse Attention as /Reshape_179_output_0
Fuse Attention as /Reshape_188_output_0
Fuse Attention as /Reshape_197_output_0
Fuse Attention as /Reshape_206_output_0
Fuse Attention as /Reshape_215_output_0
Remove past KV for presents
Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight
inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ]
outputTensors: [ logits, presents, ]
Converted Success!

写入 gptq 权重

$ cp mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/gptq.mnn.weight && python mnn/tools/script/apply_gptq.py --mnn_graph mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/base.json --mnn_weight mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/gptq.mnn.weight --gptq_tensor pretrained_model/Qwen2.5-0.5B-Instruct-GPTQ-Int8/model.safetensors

/work/mnn/tools/script/apply_gptq.py(25)parse_name()
-> if len(parts) > 4:
(Pdb) p parts
['', 'FakeLinear_output_0__matmul_converted']
(Pdb) c
Traceback (most recent call last):
File "/work/mnn/tools/script/apply_gptq.py", line 203, in
main(args)
File "/work/mnn/tools/script/apply_gptq.py", line 193, in main
mnn_model = MNNModel(args.mnn_graph, args.mnn_weight)
File "/work/mnn/tools/script/apply_gptq.py", line 59, in init
self.parse_conv()
File "/work/mnn/tools/script/apply_gptq.py", line 68, in parse_conv
self.weights.append(MNNWeight(name, external, weight_elements))
File "/work/mnn/tools/script/apply_gptq.py", line 17, in init
self.parse_name()
File "/work/mnn/tools/script/apply_gptq.py", line 31, in parse_name
self.op_id = parts[2]
IndexError: list index out of range

经调试:self.name = '/FakeLinear_output_0__matmul_converted'

@jxt1234 jxt1234 added the duplicate This issue or pull request already exists label Nov 20, 2024
@jxt1234
Copy link
Collaborator

jxt1234 commented Nov 20, 2024

#3095 重复,先关闭

@jxt1234 jxt1234 closed this as completed Nov 20, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants