CANN Support Ascend310P to accelerate F32 and F16 LLM Model #10216

leo-pony · 2024-11-08T09:29:56Z

CANN Support Ascend310P to accelerate F32/F16 model inferencing. Corresponding issue is #10160. Q8 and Q4 will implement next.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Function is normal:

feichenchina · 2024-11-11T06:31:30Z

我这边采用 https://github.com/leo-pony/llama.cpp/blob/ascend310PAdapt/ggml/src/ggml-cann/kernels/quantize_f16_q8_0.cpp ascend310Adaptor分支的代码，在310P上运行Qwen2.5-7b-fp16.guff 模型执行推理，结果为乱码，不知道是还未支持该模型还是有什么别的原因？

leo-pony · 2024-11-11T06:41:24Z

我这边采用 https://github.com/leo-pony/llama.cpp/blob/ascend310PAdapt/ggml/src/ggml-cann/kernels/quantize_f16_q8_0.cpp ascend310Adaptor分支的代码，在310P上运行Qwen2.5-7b-fp16.guff 模型执行推理，结果为乱码，不知道是还未支持该模型还是有什么别的原因？

Compile option should with -DSOC_TYPE, such as:
cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=debug -DSOC_TYPE=Ascend310P3
cmake --build build --config debug

feichenchina · 2024-11-11T07:19:23Z

我已在 ggml/src/ggml-cann/kernels/CMakeLists.txt 文件中将未设置 SOC_TYPE 时，自动将 SOC_TYPE 设置为ascend310P3了 if (NOT SOC_TYPE) set (SOC_TYPE "ascend310p3") endif() 在 2024-11-11 14:41:45，"leo-pony" ***@***.***> 写道：我这边采用 https://github.com/leo-pony/llama.cpp/blob/ascend310PAdapt/ggml/src/ggml-cann/kernels/quantize_f16_q8_0.cpp ascend310Adaptor分支的代码，在310P上运行Qwen2.5-7b-fp16.guff 模型执行推理，结果为乱码，不知道是还未支持该模型还是有什么别的原因？ Compile option should with -DSOC_TYPE, such as: cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=debug -DSOC_TYPE=Ascend310P3 cmake --build build --config debug — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

CANN Support run F16 model on Ascend310P

cd9a324

hipudding self-requested a review November 8, 2024 09:30

hipudding added enhancement New feature or request Ascend NPU issues specific to Ascend NPUs labels Nov 8, 2024

Delete the commented code

c4e8479

leo-pony mentioned this pull request Nov 11, 2024

Bug: CANN: Inference result garbled #10252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN Support Ascend310P to accelerate F32 and F16 LLM Model #10216

CANN Support Ascend310P to accelerate F32 and F16 LLM Model #10216

leo-pony commented Nov 8, 2024

feichenchina commented Nov 11, 2024

leo-pony commented Nov 11, 2024

feichenchina commented Nov 11, 2024 via email

CANN Support Ascend310P to accelerate F32 and F16 LLM Model #10216

Are you sure you want to change the base?

CANN Support Ascend310P to accelerate F32 and F16 LLM Model #10216

Conversation

leo-pony commented Nov 8, 2024

feichenchina commented Nov 11, 2024

leo-pony commented Nov 11, 2024

feichenchina commented Nov 11, 2024 via email