Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cannot pickle Map or System object (incompatibility with emcee) #292

Closed
iancrossfield opened this issue Sep 19, 2021 · 8 comments
Closed
Labels
bug Something isn't working

Comments

@iancrossfield
Copy link

iancrossfield commented Sep 19, 2021

Describe the bug
When I try to use starry in the MCMC sampler `emcee' (via either the Map object directly, or via the System object), emcee fails in both cases because the Map object cannot be pickled. Specifically:
"AttributeError: Can't pickle local object 'Map..Map'"

To Reproduce

import starry, pickle
map = starry.Map(udeg=2, rv=True)
f = open('spam.pickle', 'w')
pickle.dump(map, f)

Results in the error listed above.

Expected behavior
I expect to be able to successfully use starry Map or System objects in Bayesian inference calculations (specifically, MCMC; very specifically, via 'emcee').

Your setup (please complete the following information):

  • Version of starry: 1.1.2
  • Operating system: Ubuntu 20.04
  • Python version & installation method (pip, conda,etc.): Python 3.7.6 , installed via pip
@iancrossfield iancrossfield added the bug Something isn't working label Sep 19, 2021
@rodluger
Copy link
Owner

You must be passing the starry.Map instance as an argument to the log probability function. Currently, the Map class is generated from a class factory, so it's not directly picklable. I think it would be straightforward to fix this, but I would recommend a different, likely much faster approach than what you're currently doing: switch to lazy evaluation mode and explicitly compile the theano function yourself. There's a section in the docs showing how to do this.

The reason it's faster is that you're compiling all the operations (instantiating the map, setting the attributes, computing the flux, computing the likelihood) into a single executable function. You're also baking in the dataset into the compiled routine, so there's much, much less data transfer across cores.

I usually recommend using pymc3 instead to do sampling, since that's what starry is optimized for (and it can be significantly faster for many problems). But following the steps above should work for you. Let me know if you run into issues.

@iancrossfield
Copy link
Author

Thanks @rodluger . After wrestling with theano (entirely new to me!) for a while and rewriting my model function, I was able to get the function to successfully compile in theano. The new 'fast,' 'non-lazy' version runs successfully, but emcee still bombs out with a long error trace, ending in:

MemoryError: std::bad_alloc
Apply node that caused the error: <starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>(AdvancedIncSubtensor1{inplace,set}.0, IncSubtensor{InplaceSet;int64}.0)
Toposort index: 52
Inputs types: [TensorType(float64, vector), TensorType(float64, vector)]
Inputs shapes: [(3,), (16,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([-1. , 0.3409965 , 0.29015569]), 'not shown']
Outputs clients: [[SparseDot(SparseConstant{csc,float64,shape=(36, 36),nnz=124}, <starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>.0), InplaceDimShuffle{1,0}(<starry._core.ops.filter.FOp object at 0x7ff4b2a6dfd0>.0)]]

Backtrace when the node is created(use Theano flag traceback__limit=N to make it longer):
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper
return func(instance, *args)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 204, in flux
return tt.dot(self.X(theta, xo, yo, zo, ro, inc, obl, u, f), y)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper
return func(instance, *args)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 175, in X
F = self.F(u, f)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/utils.py", line 104, in wrapper
return func(instance, *args)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/core.py", line 137, in F
return self._F(u, f)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/theano/graph/op.py", line 250, in call
node = self.make_node(*inputs, **kwargs)
File "/home/ianc/anaconda3/lib/python3.7/site-packages/starry/_core/ops/filter.py", line 25, in make_node
outputs = [tt.TensorType(inputs[-1].dtype, (False, False))()]

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

System monitoring tools confirm that I'm nowhere near to running out of system memory, and searching online for "theano" "MemoryError: std::bad_alloc" reveals zero results.

I don't think this is a problem with starry; more likely, it's probably some issue relating to the unholy fusion of theano with my byzantine, and rather antiquated, error functions (specifically, I'm using https://crossfield.ku.edu/python/phasecurves.html#phasecurves.lnprobfunc). I'll try to explore the PyMC3 options mentioned above, though even their 'Tutorials' seem rather opaque to this frequentist-at-heart astronomer.

@rodluger
Copy link
Owner

Oh, quite the opposite -- I think that's a starry segfault. I believe you're instantiating the starry map like

map = starry.Map(udeg=2, rv=True)

If so, can you try doing

map = starry.Map(ydeg=1, udeg=2, rv=True)

and let me know if that fixes the issue? This could be an old bug popping up again when the degree of the Ylm expansion is zero.

@iancrossfield
Copy link
Author

Adding ydeg=1 in the initialization results in about the same error trace as I posted in my previous comment, above.

But I CAN manage to successfully run starry's RM calculation wrapped inside of emcee IF I set pool=None. So no multiprocessing, but at least it does run -- probably in less time than it would take me to learn PyMC3, too ;)

@rodluger
Copy link
Owner

OK, I can confirm the issue is in starry. Here's a MWE that triggers the error you're seeing:

import starry
import numpy as np
import emcee
from multiprocessing import Pool
import theano
import theano.tensor as tt


if __name__ == "__main__":

    # Generate dataset
    np.random.seed(0)
    theta = np.linspace(0, 360, 100)
    data = 0.1 * np.random.randn(len(theta))

    def log_prob_(x):
        map = starry.Map(rv=True)
        map.inc = x[0]
        map.veq = x[1]
        model = map.rv(theta=theta)
        return -0.5 * tt.sum((model - data) ** 2)

    x = tt.dvector()
    log_prob = theano.function([x], log_prob_(x))

    nwalkers = 4
    ndim = 2
    niter = 100
    guess = np.random.randn(nwalkers, ndim)

    with Pool() as pool:
        sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
        sampler.run_mcmc(guess, niter, progress=True)

I'll look into it.

@rodluger
Copy link
Owner

rodluger commented Sep 20, 2021

I can also confirm this seems to only happen during multiprocessing. Setting pool=None fixes the issue. When compiling starry in debug mode, I get the following Eigen error somewhere within the call to computeF():

Assertion failed: (lhs.cols() == rhs.rows() && "invalid matrix product" && "if you wanted a coeff-wise or a dot product use the respective explicit functions"), function Product, file starry/_core/ops/lib/vendor/eigen_3.3.5/Eigen/src/Core/Product.h, line 97.

Looks like B.U1 is uninitialized when it gets used here. If I print its shape I get (145, 140669174152478), which is bonkers!

@rodluger
Copy link
Owner

I figured it out. The issue was on this line. For some reason I had declared B as a reference. Reverting that fixes the issue for multiprocessing.

@rodluger
Copy link
Owner

@iancrossfield Feel free to re-open if you run into more trouble or have any other questions. The code is now working on master, and I released a patch that should be available on PyPI within the hour (v1.1.3).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants