Skip to content

Fix QAT model converting #2190

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Fix QAT model converting #2190

wants to merge 2 commits into from

Conversation

veralauee
Copy link

Convert quantization aware trained model from TF to ONNX has several issues --

  1. QuantizeLinear and DequantizeLinear are fused into conv layer, but the downstream compiler(e.g., TensorRT) needs the Q/DQ layers to determine whether to use int8 or not. See issue QDQ node for weight tensor of Con2D undergoes Constant folding (enabled for node using tf type=FakeQuantWithMinMaxVarsPerChannel) #1972 . We need to keep Q/DQ layer unfused. QuantizeLinear and DequantizeLinear are corresponding to FakeQuantWithMinMaxVars in TensorFlow, so excluding it from can_fold in tf_utils.py can solve it.
  2. Need to allow narrow_range in quantized nodes. TensorRT maps [min, max] to [-127, 127](see Page 12) , which needs 0 in fp32 to be mapped to 0 in int8. Also see narrow_range=True in TensorRT/tools/tensorflow-quantization here.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants