-
Notifications
You must be signed in to change notification settings - Fork 108
Add convenience Functions to save/load quantized Model to/from Disk #411
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
This probably belongs more to Axon than Bumblebee, since we need a way to store |
There is model: And Model state: And Quantized Tensors can be serialized with Nx and Safetensor Is it just model_state.data that needs to be serialized, or is there more? Just loading and Quantization took around 3 Minutes and starting/loading on GPU it again over 3 minutes. I would like to speed that up… |
Oh, I missed For the model state you can actually do # Serialize
File.write!("state.nx", Nx.serialize(model_info.params))
# Load
{:ok, spec} = Bumblebee.load_spec({:hf, "..."})
model = spec |> Bumblebee.build_model() |> Axon.Quantization.quantize_model()
params = File.read!("state.nx") |> Nx.deserialize()
model_info = %{spec: spec, model: model, params: params} This may work for your use case if you have enough RAM to serialize and deserialize. There are two issues with
|
Since loading Quantized models from HF is not possible, jet.
I was searching for an easy way to safe models after quantization, as easy as loading them from HF.
And then a function to load the file again.
The text was updated successfully, but these errors were encountered: