[Question] About CPU performance #1666

mingfeima · 2024-09-03T02:06:29Z

Hi, I am an engineer from Intel and I work mostly on the performance optimization of PyTorch on intel Xeon CPUs (also I am the pytorch module maintainer for cpu performance). Just come across this amazing project and from this blog fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse the chart says DeepSparse accelerates the sparse-quantized Llama models to 6-8x faster over the dense FP32 baseline.

The 6-8x speedup of sparse model against dense model is a fascinating result. My purpose is to check if there is a chance to further improve the performance with our previous effort on LLM optimizations.

I run according the script from https://github.com/neuralmagic/deepsparse?tab=readme-ov-file#try-it-now, however from the hardware profiler I can tell the hardware efficiency is still not very high (only ~12 cores in use on average from a 40-core machine, leading to significant sync overhead and very high CPI (cycles per instructions)). Maybe I can do something to improve this, but I am not very familiar with this codebase, and I need some guidance here:

how can I reproduce the above results?
how the model is deployed ? with onnx-runtime?

Additionally, do you continue this sparse fine tuning job on other models, for example Llama3 ? Also how about int4 ?

Nafay-0 · 2024-10-21T10:51:11Z

Hey @mingfeima, I was curious if you’ve found a solution to the core utilization issue or made any progress with optimizing performance? I’m tackling a similar challenge and would love to hear about any updates or insights you’ve gained!

mingfeima · 2024-10-22T01:30:43Z

Hey @mingfeima, I was curious if you’ve found a solution to the core utilization issue or made any progress with optimizing performance? I’m tackling a similar challenge and would love to hear about any updates or insights you’ve gained!

I need additional information about how the model is being deployed to investigate how to optimize the performance.

Nafay-0 · 2024-10-28T06:03:46Z

Currently just trying to run a mode on CPU locally to optimize its performance

mingfeima added the enhancement New feature or request label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] About CPU performance #1666

[Question] About CPU performance #1666

mingfeima commented Sep 3, 2024

Nafay-0 commented Oct 21, 2024

mingfeima commented Oct 22, 2024

Nafay-0 commented Oct 28, 2024

[Question] About CPU performance #1666

[Question] About CPU performance #1666

Comments

mingfeima commented Sep 3, 2024

Nafay-0 commented Oct 21, 2024

mingfeima commented Oct 22, 2024

Nafay-0 commented Oct 28, 2024