Skip to content

[BUG] serializable decorator produces wrong package for serialization #451

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
vpratz opened this issue Apr 27, 2025 · 3 comments
Open

Comments

@vpratz
Copy link
Collaborator

vpratz commented Apr 27, 2025

To Reproduce
On dev or main, run

import bayesflow as bf
import keras

keras.saving.get_registered_name(bf.Adapter)  # or any other class without "package" set in the serializable decorator

Output:

'bayesflow.utils.decorators>Adapter'

Expected behavior
The path should be the path to the module, so

'bayesflow.adapters.adapter>Adapter'

Solution
In utils.serialization.serializable, increase the depth in the sys._getframe call to 2.

Additional context
I came across this for #449 and #450, and first assumed my changes to the allow_args decorator had caused it, so I already fixed it in a1b4d19, which is included in those PRs. Upon further testing, I noticed the problem already existed in dev and main.

The fix will break loading of serialized files created while the bug was present, but I think fixing it quickly and putting out an update is probably the best way forward. We might want to combine this with the deprecation of our old serialization helper functions in #450.
I think we should reflect on whether we want to keep the functionality of serializable like this. In my opinion, it brings a certain fragility, because moving classes to a different file or folder breaks deserialization, even when the class itself did not change. Requiring package to be set manually will remove this footgun, and thereby remove one cause for accidentally breaking serializability.

@LarsKue
Copy link
Contributor

LarsKue commented Apr 29, 2025

I think we have 3 options here:

1. package=sys._getframe(2)

  • Highly unlikely to produce name clashes
  • Easy to use
  • High consistency across packages
  • Makes moving classes harder

2. package="bayesflow" everywhere

  • Much more likely to produce name clashes
  • Easy to use
  • High consistency
  • Makes moving classes trivial

3. Manually supplied package

  • Unlikely to produce name clashes
  • Might be confusing for users
  • Low consistency, even within BayesFlow (historically)
  • Makes moving classes easy

I like option 1 the best, but I am open to have my mind changed.

@vpratz
Copy link
Collaborator Author

vpratz commented Apr 29, 2025

Thanks for listing them explicitly. As all have pros and cons, I think it comes down to priorities, which are a bit ambiguous. I think for me the order (for somewhat arbitrary priorities) is:

  1. Not breaking things accidentally/stability (Favors 3, no naming clashes, no accidental changes of packages by moving stuff around)
  2. Putting no unnecessary restrictions on how we structure or change internal/private parts of the library (Favors 3; also 2 but there we can not have classes with identical names, not sure if we want to have those anyway though)
  3. Being understandable by users: A bit of a mixed bag. I believe identification goes mostly by class name and the package is secondary here. 2 is best in this regard (no "noise" by a potentially misleading package, only the class name). 3 does not necessarily match the module, which can be confusing. 1 represents the internal module, which might also be confusing (e.g. for bayesflow.network.MLP, the resulting package is bayesflow.networks.mlp.mlp).
  4. Ease of use: Depends on how much consistency we aim for. 3 is worse here because one has to manually make a decision. This is linked to the previous point, because how hard this decision is depends on how consistent you want to be. But you could always default to option 1 and use the path to the file the class lives in, and the only additional effort would be typing it out once.

This is why I'm in favor of option 3. Is this something you could get behind? Or is there an aspect missing or something you would prioritize differently?

@stefanradev93
Copy link
Contributor

It will be 30 minutes of work to manually provide package names, so I am definitely not against 3.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants