How can i quantize model when i use custom model on tensorrt-llm?whether i need to write c++ code or not?any examples?thank u for your time and help. #2718

DelongYang666 · 2025-01-24T16:38:40Z

No description provided.

nv-guomingz · 2025-01-25T14:05:51Z

Step 0. https://nvidia.github.io/TensorRT-LLM/architecture/add-model.html
Step 1. Implement your private op if needed(C++ refer to plugin implmentation, it's not a must-have option)
Step 2. Quantize the model with recipe like int4/int8 weights only or int8 smoothquant.

DelongYang666 · 2025-01-25T14:34:42Z

Thank u！i saw the document [https://nvidia.github.io/TensorRT-LLM/architecture/add-model.html] 3 hours ago. And I will try.And, a vedio on bilibili told me that fp8 may performence well than other quantization way. But , this way depends on modelopt. The modelopt does not support my custom model to quantize. Any ways to implement fp8 ?

nv-guomingz · 2025-01-27T03:52:50Z

Thank u！i saw the document [https://nvidia.github.io/TensorRT-LLM/architecture/add-model.html] 3 hours ago. And I will try.And, a vedio on bilibili told me that fp8 may performence well than other quantization way. But , this way depends on modelopt. The modelopt does not support my custom model to quantize. Any ways to implement fp8 ?

@RalphMao do you have any comments on this question?

nv-guomingz added the triaged Issue has been triaged by maintainers label Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can i quantize model when i use custom model on tensorrt-llm?whether i need to write c++ code or not?any examples?thank u for your time and help. #2718

How can i quantize model when i use custom model on tensorrt-llm?whether i need to write c++ code or not?any examples?thank u for your time and help. #2718

DelongYang666 commented Jan 24, 2025

nv-guomingz commented Jan 25, 2025

DelongYang666 commented Jan 25, 2025

nv-guomingz commented Jan 27, 2025

How can i quantize model when i use custom model on tensorrt-llm?whether i need to write c++ code or not?any examples?thank u for your time and help. #2718

How can i quantize model when i use custom model on tensorrt-llm?whether i need to write c++ code or not?any examples?thank u for your time and help. #2718

Comments

DelongYang666 commented Jan 24, 2025

nv-guomingz commented Jan 25, 2025

DelongYang666 commented Jan 25, 2025

nv-guomingz commented Jan 27, 2025