Skip to content

Discussion: Investigate Perf Boosts Through Pruning (DeepSparse) #931

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
MillionthOdin16 opened this issue Apr 13, 2023 · 3 comments
Closed
Labels

Comments

@MillionthOdin16
Copy link

MillionthOdin16 commented Apr 13, 2023

Just saw this and it seems pretty crazy. I don't know exactly where to put it, but figured is worth discussing. They claim significant performance gains and pretty crazy model compression capabilities. A lot of the interesting information is straight on the readme page that I linked.

Neural Magic Repo Link

Our MLPerf Inference v3.0 submission contains the following results for the BERT-Large SQuAD v1.1 question answering task:

Benchmark Engine Precision Compressed File Size SQuAD v1.1 F1 Score (R=X% of Base Accuracy) Offline Throughput [samples/sec]
BERT-Large Baseline ONNXRuntime FP32 1.3 GB 90.874 (R=100.00%) 4.60
oBERT-Large 99% DeepSparse INT8 38.2 MB 90.03 (R=99.07%) 1367.14
oBERT-MobileBERT 99.9% DeepSparse INT8 19.45 MB 90.80 (R=99.92%) 3275.62
oBERT-MobileBERT 99% DeepSparse INT8 9.56 MB 90.41 (R=99.49%) 5578.73

https://github.com/mlcommons/inference_results_v3.0/blob/main/open/NeuralMagic/README.md

@MillionthOdin16 MillionthOdin16 changed the title Investigate Perf Boosts Through Pruning Discussion: Investigate Perf Boosts Through Pruning Apr 13, 2023
@MillionthOdin16 MillionthOdin16 changed the title Discussion: Investigate Perf Boosts Through Pruning Discussion: Investigate Perf Boosts Through Pruning (DeepSparse) Apr 13, 2023
@jon-chuang
Copy link
Contributor

From the linked repo:

unstructured gradual pruning, quantization-aware training, and structural distillation

I think the model layout would be very different, and further, not comparable to llama. But definitely interesting.

@slaren
Copy link
Member

slaren commented Apr 15, 2023

This may be interesting: https://github.com/horseee/LLaMA-Pruning

Pruning: The following script globally removes 50% of the dimensions of the LLaMA-7B model, resulting in a lightweight model with 1.72B parameters.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants