Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

StackOverflow error on v0.2.5 but not v0.2.4 #7

Closed
Denis-Titov opened this issue Dec 5, 2022 · 6 comments
Closed

StackOverflow error on v0.2.5 but not v0.2.4 #7

Denis-Titov opened this issue Dec 5, 2022 · 6 comments

Comments

@Denis-Titov
Copy link

Hi,

Apologies in advance but I couldn't find a MWE.

I used 0.2.3 version for some time and everything was great but after upgrading to 0.2.5 I started getting StackOverflow.
The whole error message is only ERROR: StackOverflowError:
I tried to reproduce it with rosenbrock but that works fine.
My optimization is the least squares fitting of kinetic rate equation with 26 kinetic constants to ~500 data points that I repeat 20 times to make sure I'm close to the global minimum.
I looked through the commits and it seems like there was only one change not in tests that was about stability.
I'm happy to try a few fixes if you have ideas but since I don't MWE, I understand this might be difficult.
Also happy to send you my code but it's about 300 lines so might be a lot of work to look through.
I'll close this if you feel it'll be hard to fix without MWE.
I'm using 0.2.4 and everything works well.

@jbrea
Copy link
Owner

jbrea commented Dec 21, 2022

Thanks for reporting!

I suspect the problem is the new version of MEigen.
Does the problem persist with the newest commit?
You can load the newest version like this:

using Pkg
Pkg.add(name = "CMAEvolutionStrategy", rev = "376fa68")

If you prefer you can also send me your code and I could try myself.

@Denis-Titov
Copy link
Author

StackOverflow error is gone with the new commit but unfortunately I cannot reproduce it on v0.2.5 anymore either 🤦‍♂️
The error was very robust before appearing every single time when I run that optimization before and as far as I can tell I used exactly the same code.
Sorry for wasting your time... maybe something else was going on on my computer that in some weird way caused this error before or maybe some other got updated causing the error to go away.

@Denis-Titov
Copy link
Author

UPDATE:

I could reproduce the StackOverflow on a cluster with a larger number of optimization runs.
One of my runs will have about 100x10x20=20,000 optimizations.
With v0.2.5: 2 out of 4 runs had StackOverflow error
With 376fa68: 1 out of 4 runs had StackOverflow error
With v0.2.4: I've never seen StackOverflow error in 100+ runs

Not clear if the difference between v0.2.5 and 376fa68 is significant due to how rare this error is.
But definitely, v0.2.4 doesn't exhibit the error.
Sorry, I can't be more helpful here.

If you want to try other fixes, I can run them on my code but due to how rare the error is, I'm not sure if it's worth it as I'll have to run for a long time to be confident.

@jbrea
Copy link
Owner

jbrea commented Dec 22, 2022

Thanks a lot for the update! This is very helpful. Do you still get StackOverflow errors with 19eae17
?

@Denis-Titov
Copy link
Author

19eae17 seemed to have done something.
I rerun the same analysis 10 times (~100,000 optimization each), and I did not get any StackOverflow errors.
I'll let you know if I encounter this error again in future but it seems to have been fixed (or at least improved) by 19eae17.

Out of curiosity, what was the rationale for the c26ee4f that presumably led to this rare error? What "stability" did it improve?

@jbrea
Copy link
Owner

jbrea commented Dec 22, 2022

Great, thanks for the feedback!

Out of curiosity, what was the rationale for the c26ee4f that presumably led to this rare error? What "stability" did it improve?

CMA-ES assumes a positive definite covariance matrix, but in rare cases this isn't satisfied. To prevent failures because of non-positive-definiteness, I used an unjustified heuristic prior to c26ee4f, but I noticed that this had a negative (but small) effect on some results. Starting with c26ee4f, the covariance matrix is only changed, when it isn't positive definite: it is changed by adding the identity matrix multiplied by some small constant, until the covariance matrix is positive definite (this is done recursively, which can result in a StackOverflow error). With 19eae17 the small constant is multiplied by 10 in each recursion, which should be sufficient in all reasonable cases to prevent stack overflow.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants