Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Resolve #1842 by setting OMP_NUM_THREADS=2 in toolkit showcase #1846

Merged
merged 8 commits into from
Apr 9, 2024

Conversation

j-wags
Copy link
Member

@j-wags j-wags commented Mar 25, 2024

Copy link

codecov bot commented Mar 26, 2024

Codecov Report

Merging #1846 (629a65d) into main (7c22c1e) will decrease coverage by 0.02%.
Report is 5 commits behind head on main.
The diff coverage is n/a.

Additional details and impacted files

@j-wags
Copy link
Member Author

j-wags commented Mar 26, 2024

Oh goodness, the rabbit hole goes deeper.

image

@j-wags
Copy link
Member Author

j-wags commented Mar 29, 2024

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@j-wags
Copy link
Member Author

j-wags commented Mar 29, 2024

The original error was "charge assignment fails in sqm in linux/rdkit CI runs with the original atom ordering of the ligand and OMP_NUM_THREADS=1". @Yoshanuikabundi also told me that he was also able to reproduce this locally on his linux desktop.

I've tried reproducing this error in CI in this PR, locally on my M1 mac, and in a linux/amd64 docker container on the same mac. The only sqm failures I've seen are in macos/rdkit CI builds on this PR:

  • FAILED: reorder the ligand atoms
  • PASSED: running identical CI again
  • FAILED: the same as above, but also forced OMP_NUM_THREADS=1
  • FAILED: running identical CI again
  • PASSED: resetting the ligand atom order and keeping OMP_NUM_THREADS=1

(note that the failures in the openff-docs builds were linux/rdkit, NOT the macos/rdkit ones that I'm getting here)

I think this means the error is stochastic and may not actually be based on atom ordering or OS.

I'm toying with the idea of having the AmberTools charge assignment pathway generate two conformers and try the second one if the first fails. On one hand, the behavior change would seem acceptable here, since it changes a random error into a valid outcome. On the other, it takes a relatively long time to fail on one conformer (~5-10 mins) and if there were a legitimate problem with the input then this change would double the the amount of time before the user got an error message.

So, I'm unsure here. I'll talk with @Yoshanuikabundi on Monday to determine if there's some way to more consistently reproduce the error so we can have a better grasp on options moving forward.

@j-wags j-wags changed the title Resolve #1842 by canonically ordering showcase ligand [DNM] Resolve #1842 by canonically ordering showcase ligand Mar 29, 2024
@mattwthompson
Copy link
Member

If the number of threads/cores/solar flares is important, it would be useful to work that into the error message. The current failures are heavily obfuscated by the ValueError that eventually gets to the user

@j-wags j-wags changed the title [DNM] Resolve #1842 by canonically ordering showcase ligand Resolve #1842 by setting OMP_NUM_THREADS=2 in toolkit showcase Apr 9, 2024
@j-wags j-wags merged commit 13bebaf into main Apr 9, 2024
18 checks passed
@j-wags j-wags deleted the bandaid-1842 branch April 9, 2024 19:56
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The toolkit showcase stochastically fails with OMP_NUM_THREADS=1 and AmberToolsToolkitWrapper
2 participants