Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Question] About CPU performance #1666

Open
mingfeima opened this issue Sep 3, 2024 · 3 comments
Open

[Question] About CPU performance #1666

mingfeima opened this issue Sep 3, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@mingfeima
Copy link

Hi, I am an engineer from Intel and I work mostly on the performance optimization of PyTorch on intel Xeon CPUs (also I am the pytorch module maintainer for cpu performance). Just come across this amazing project and from this blog fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse the chart says DeepSparse accelerates the sparse-quantized Llama models to 6-8x faster over the dense FP32 baseline.

image

The 6-8x speedup of sparse model against dense model is a fascinating result. My purpose is to check if there is a chance to further improve the performance with our previous effort on LLM optimizations.

I run according the script from https://github.com/neuralmagic/deepsparse?tab=readme-ov-file#try-it-now, however from the hardware profiler I can tell the hardware efficiency is still not very high (only ~12 cores in use on average from a 40-core machine, leading to significant sync overhead and very high CPI (cycles per instructions)). Maybe I can do something to improve this, but I am not very familiar with this codebase, and I need some guidance here:

  • how can I reproduce the above results?
  • how the model is deployed ? with onnx-runtime?

Additionally, do you continue this sparse fine tuning job on other models, for example Llama3 ? Also how about int4 ?

@mingfeima mingfeima added the enhancement New feature or request label Sep 3, 2024
@Nafay-0
Copy link

Nafay-0 commented Oct 21, 2024

Hey @mingfeima, I was curious if you’ve found a solution to the core utilization issue or made any progress with optimizing performance? I’m tackling a similar challenge and would love to hear about any updates or insights you’ve gained!

@mingfeima
Copy link
Author

Hey @mingfeima, I was curious if you’ve found a solution to the core utilization issue or made any progress with optimizing performance? I’m tackling a similar challenge and would love to hear about any updates or insights you’ve gained!

I need additional information about how the model is being deployed to investigate how to optimize the performance.

@Nafay-0
Copy link

Nafay-0 commented Oct 28, 2024

Currently just trying to run a mode on CPU locally to optimize its performance

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants