Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

jfduma · 2024-11-20T07:33:29Z

开发机：ubuntu 20.04 mnn 3.0.0

模型 huggingface：Qwen2.5-0.5B-Instruct 和 Qwen2.5-0.5B-Instruct-GPTQ-Int8

导出 onnx 模型

$ python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export onnx --dst_path mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3

✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s]
⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s]
✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s]
✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s]
✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s]
✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]

导出 mnn 模型

$ mnn/build/MNNConvert --modelFile mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn --weightQuantBits 8 --weightQuantBlock 128 --weightQuantAsymmetric --saveExternalData --transformerFuse --allowCustomOp

The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 3
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
Fuse Attention as /Reshape_8_output_0
Fuse Attention as /Reshape_17_output_0
Fuse Attention as /Reshape_26_output_0
Fuse Attention as /Reshape_35_output_0
Fuse Attention as /Reshape_44_output_0
Fuse Attention as /Reshape_53_output_0
Fuse Attention as /Reshape_62_output_0
Fuse Attention as /Reshape_71_output_0
Fuse Attention as /Reshape_80_output_0
Fuse Attention as /Reshape_89_output_0
Fuse Attention as /Reshape_98_output_0
Fuse Attention as /Reshape_107_output_0
Fuse Attention as /Reshape_116_output_0
Fuse Attention as /Reshape_125_output_0
Fuse Attention as /Reshape_134_output_0
Fuse Attention as /Reshape_143_output_0
Fuse Attention as /Reshape_152_output_0
Fuse Attention as /Reshape_161_output_0
Fuse Attention as /Reshape_170_output_0
Fuse Attention as /Reshape_179_output_0
Fuse Attention as /Reshape_188_output_0
Fuse Attention as /Reshape_197_output_0
Fuse Attention as /Reshape_206_output_0
Fuse Attention as /Reshape_215_output_0
Remove past KV for presents
Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight
inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ]
outputTensors: [ logits, presents, ]
Converted Success!

写入 gptq 权重

$ cp mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/gptq.mnn.weight && python mnn/tools/script/apply_gptq.py --mnn_graph mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/base.json --mnn_weight mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/gptq.mnn.weight --gptq_tensor pretrained_model/Qwen2.5-0.5B-Instruct-GPTQ-Int8/model.safetensors

/work/mnn/tools/script/apply_gptq.py(25)parse_name()
-> if len(parts) > 4:
(Pdb) p parts
['', 'FakeLinear_output_0__matmul_converted']
(Pdb) c
Traceback (most recent call last):
File "/work/mnn/tools/script/apply_gptq.py", line 203, in
main(args)
File "/work/mnn/tools/script/apply_gptq.py", line 193, in main
mnn_model = MNNModel(args.mnn_graph, args.mnn_weight)
File "/work/mnn/tools/script/apply_gptq.py", line 59, in init
self.parse_conv()
File "/work/mnn/tools/script/apply_gptq.py", line 68, in parse_conv
self.weights.append(MNNWeight(name, external, weight_elements))
File "/work/mnn/tools/script/apply_gptq.py", line 17, in init
self.parse_name()
File "/work/mnn/tools/script/apply_gptq.py", line 31, in parse_name
self.op_id = parts[2]
IndexError: list index out of range

经调试：self.name = '/FakeLinear_output_0__matmul_converted'

The text was updated successfully, but these errors were encountered:

jxt1234 · 2024-11-20T09:40:06Z

与 #3095 重复，先关闭

jxt1234 added the duplicate This issue or pull request already exists label Nov 20, 2024

jxt1234 closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

jfduma commented Nov 20, 2024

jxt1234 commented Nov 20, 2024

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

Comments

jfduma commented Nov 20, 2024

导出 onnx 模型

导出 mnn 模型

写入 gptq 权重

jxt1234 commented Nov 20, 2024