Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CANN Support Ascend310P to accelerate F32 and F16 LLM Model #10216

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

leo-pony
Copy link
Contributor

@leo-pony leo-pony commented Nov 8, 2024

CANN Support Ascend310P to accelerate F32/F16 model inferencing. Corresponding issue is #10160. Q8 and Q4 will implement next.

Function is normal:
image
image

@hipudding hipudding self-requested a review November 8, 2024 09:30
@hipudding hipudding added enhancement New feature or request Ascend NPU issues specific to Ascend NPUs labels Nov 8, 2024
@feichenchina
Copy link

我这边采用 https://github.com/leo-pony/llama.cpp/blob/ascend310PAdapt/ggml/src/ggml-cann/kernels/quantize_f16_q8_0.cpp ascend310Adaptor分支的代码,在310P上运行Qwen2.5-7b-fp16.guff 模型执行推理,结果为乱码,不知道是还未支持该模型还是有什么别的原因?

@leo-pony
Copy link
Contributor Author

我这边采用 https://github.com/leo-pony/llama.cpp/blob/ascend310PAdapt/ggml/src/ggml-cann/kernels/quantize_f16_q8_0.cpp ascend310Adaptor分支的代码,在310P上运行Qwen2.5-7b-fp16.guff 模型执行推理,结果为乱码,不知道是还未支持该模型还是有什么别的原因?

Compile option should with -DSOC_TYPE, such as:
cmake -B build -DGGML_CANN=on -DCMAKE_BUILD_TYPE=debug -DSOC_TYPE=Ascend310P3
cmake --build build --config debug

@feichenchina
Copy link

feichenchina commented Nov 11, 2024 via email

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Ascend NPU issues specific to Ascend NPUs enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants