support quantized models #812

tharvik · 2024-10-24T12:35:16Z

currently, we use pretty much float32 tensors all around, which yields pretty huge models.
after discussion with @martinjaggi, training is hard to do without float32, but inference can probably utilize uint8 tensors, dividing up to 4x the size of trained models.

note: check that the model is still behaving correctly after quantization

tharvik added the feature New feature or request label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support quantized models #812

support quantized models #812

tharvik commented Oct 24, 2024

support quantized models #812

support quantized models #812

Comments

tharvik commented Oct 24, 2024