Add quantize script for batch quantization #92

prusnak · 2023-03-13T12:50:30Z

Alternative to #17 suggested in #17 (comment)

Jettford · 2023-03-14T20:37:52Z

I wrote up a basic python implementation of the same script for Windows users, would it be worth making a pull request to replace the batch script?

import os
import re
import sys

def print_usage():
    print("Usage: llama-quantize.py 7B|13B|30B|65B [--remove-f16]")
    exit(0)

if not len(sys.argv) > 1:
    print_usage()
    
regex_test = re.compile("^[0-9]{1,2}B$")

if not regex_test.match(sys.argv[1]):
    print_usage()
    
model_directory = f"./models/{sys.argv[1]}/"

if not os.path.exists(model_directory):
    print("Failed to find model directory")
    
    print_usage()
    
for file in os.listdir(model_directory):
    if not "ggml-model-f16.bin" in file:
        continue
        
    file = os.path.join(model_directory, file)
    
    new_name = file.replace("f16", "q4_0")
    
    binary_name = "./quantize"
    
    if sys.platform == "win32":
        binary_name += ".exe"
        
        binary_name = binary_name[2:]
        
    os.system(f"{binary_name} {file} {new_name} 2")
    
    if len(sys.argv) > 2:
        if sys.argv[2] == "--remove-f16":
            os.remove(file)

tmzncty · 2023-03-16T01:15:46Z

我觉得，直接编译出quantize.exe，然后在CMD中运行就好了。
‘‘‘quantize.exe ggml-model-f16.bin ggml-model-q4.bin 2’’’

…d_bos. Closes ggml-org#92

Add quantize script for batch quantization

daedf7c

prusnak mentioned this pull request Mar 13, 2023

Add oneliner for batch quantization #17

Closed

ggerganov added 4 commits March 13, 2023 18:05

Indentation

d328973

README for new quantize.sh

5f5332b

Fix script name

23d334b

Fix file list on Mac OS

acf35ec

ggerganov approved these changes Mar 13, 2023

View reviewed changes

ggerganov merged commit d1f2247 into master Mar 13, 2023

ggerganov deleted the quantize-sh branch March 13, 2023 16:15

This comment was marked as outdated.

# to view

dmahurin pushed a commit to dmahurin/llama.cpp that referenced this pull request May 31, 2023

Allow model to tokenize strings longer than context length and set ad…

11a6fc5

…d_bos. Closes ggml-org#92

dmahurin pushed a commit to dmahurin/llama.cpp that referenced this pull request Jun 1, 2023

Allow model to tokenize strings longer than context length and set ad…

a439fe1

…d_bos. Closes ggml-org#92

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Allow model to tokenize strings longer than context length and set ad…

7a536e8

…d_bos. Closes ggml-org#92

This was referenced Mar 2, 2024

Continuous batching load test stuck #5827

Closed

Continuous batching load test limited at 75 VU with 1x3090 vllm-project/vllm#3160

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantize script for batch quantization #92

Add quantize script for batch quantization #92

prusnak commented Mar 13, 2023

Jettford commented Mar 14, 2023

This comment was marked as outdated.

tmzncty commented Mar 16, 2023

Add quantize script for batch quantization #92

Add quantize script for batch quantization #92

Conversation

prusnak commented Mar 13, 2023

Jettford commented Mar 14, 2023

This comment was marked as outdated.

tmzncty commented Mar 16, 2023