Skip to content

Add warning for varying simulator output sizes #370

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
LarsKue opened this issue Mar 25, 2025 · 2 comments
Closed

Add warning for varying simulator output sizes #370

LarsKue opened this issue Mar 25, 2025 · 2 comments
Labels
efficiency Some code needs to be optimized user interface Changes to the user interface and improvements in usability

Comments

@LarsKue
Copy link
Contributor

LarsKue commented Mar 25, 2025

Varying simulator output sizes are a common occurrence when the number of samples varies between calls to simulator.sample():

def context(batch_size):
    n = np.random.randint(10, 101)
    return dict(n=n)

def prior():
    mu = np.random.normal()
    sigma = np.random.gamma(shape=2)
    return dict(mu=mu, sigma=sigma)

def likelihood(n, mu, sigma):
    y = np.random.normal(mu, sigma, size=n)
    return dict(y=y)

simulator = bf.make_simulator([prior, likelihood], meta_fn=context)

However, these can trigger excessive compile times in JAX, where each value for n triggers a recompilation. For a wide range of n, this can mean that the compilation dominates the training time.

The current best-practice fix for users is to use padded tensors:

def likelihood(n, mu, sigma):
    y = np.random.normal(mu, sigma, size=100)  # uses fixed maximum size
    y[n:] = 0  # set unused entries to zero, or some other placeholder value
    return dict(y=y)

When we detect that compile times dominate, we should output a warning to the user, with a suggested fix. We could also improve support for padded simulator output in general. Further, we could look into if there are better ways to mask out unused values rather than just setting them to placeholder values like above.

@LarsKue LarsKue added efficiency Some code needs to be optimized user interface Changes to the user interface and improvements in usability labels Mar 25, 2025
@paul-buerkner
Copy link
Contributor

paul-buerkner commented Mar 26, 2025

It sounds as if padding could be a great adapter feature, something like

adapter.pad(variable_dict, len = 100, axis = 1, value = 0)

I wouldn't want to burden the simulator with padding, since the simulator describes the probabilistic program that we would ideally keep free of any technical stuff related to deep learning. Sure, we could code padding within the simulator as a user, but I would prefer having an adapter functionality that is easier to get right for the user and doesn't mess with the probabilistic program.

@LarsKue
Copy link
Contributor Author

LarsKue commented Apr 7, 2025

Closing this as users can just switch backends. Reopening with padding as feature request.

@LarsKue LarsKue closed this as not planned Won't fix, can't repro, duplicate, stale Apr 7, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
efficiency Some code needs to be optimized user interface Changes to the user interface and improvements in usability
Projects
None yet
Development

No branches or pull requests

2 participants