Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

support quantized models #812

Open
tharvik opened this issue Oct 24, 2024 · 0 comments
Open

support quantized models #812

tharvik opened this issue Oct 24, 2024 · 0 comments
Labels
feature New feature or request

Comments

@tharvik
Copy link
Collaborator

tharvik commented Oct 24, 2024

currently, we use pretty much float32 tensors all around, which yields pretty huge models.
after discussion with @martinjaggi, training is hard to do without float32, but inference can probably utilize uint8 tensors, dividing up to 4x the size of trained models.

note: check that the model is still behaving correctly after quantization

@tharvik tharvik added the feature New feature or request label Oct 24, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant