smooth_quant

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run_benchmark.sh		run_benchmark.sh
run_quant.sh		run_quant.sh

README.md

Step-by-Step

This folder contains example code for quantizing LLaMa model.

Prerequisite

1. Environment

SKIP_RUNTIME=True pip install -r requirements.txt

2. Prepare Model

Note that this README.md uses meta-llama/Llama-2-7b-hf as an example. There are other models available that can be used for INT4 weight only quantization. The following table shows a few models' configurations:

Model	Num Hidden Layers	Num Attention Heads	Hidden Size
meta-llama/Llama-2-7b	32	32	4096
meta-llama/Llama-2-7b-chat-hf	32	32	4096
meta-llama/Llama-2-13b	40	40	5120
meta-llama/Llama-2-13b-chat-hf	40	40	5120
meta-llama/Llama-2-70b	80	64	8192
meta-llama/Llama-2-70b-chat-hf	80	64	8192

Export to ONNX model:

optimum-cli export onnx --model meta-llama/Llama-2-7b-hf --task text-generation-with-past ./Llama-2-7b-hf

Note: require optimum>=1.14.0.

Run

1. Quantization

bash run_quant.sh --model_input=/folder/of/model \ # folder path of onnx model, config and tokenizer
                  --model_output=/folder/of/quantized/model \ # folder path to save onnx model
                  --batch_size=batch_size \ # optional 
                  --dataset NeelNanda/pile-10k \
                  --alpha 0.75 \ 
                  --quant_format="QOperator" # or QDQ, optional

2. Benchmark

bash run_benchmark.sh --model_input=/folder/of/model \ # folder path of onnx model, config and tokenizer
                      --tasks=lambada_openai
                      --batch_size=batch_size \ # optional

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

smooth_quant

smooth_quant

README.md

Step-by-Step

Prerequisite

1. Environment

2. Prepare Model

Run

1. Quantization

2. Benchmark

Files

smooth_quant

Directory actions

More options

Directory actions

More options

Latest commit

History

smooth_quant

Folders and files

parent directory

README.md

Step-by-Step

Prerequisite

1. Environment

2. Prepare Model

Run

1. Quantization

2. Benchmark