-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Overhaul protomer enumeration #1779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I think this is a slightly more intuitive behavior.
Blocking: Works great for molecules with a small number of protomers, but I found that most of the tetracarboxylic acids I tried did not return the original molecule (including the only apparently real tetracarboxylic acid I could find in a quick search - SMILES "C(C(=O)O)(C(=O)O)=C(C(=O)O)(C(=O)O)"
).
>>> from openff.toolkit import Molecule
>>> molecule = Molecule.from_smiles("C(C(=O)O)(C(=O)O)=C(C(=O)O)(C(=O)O)")
>>> protomers = molecule.enumerate_protomers()
>>> molecule in protomers
False
I tracked this down to being because we by default only ask for the first 11 protomers:
>>> protomers2 = molecule.enumerate_protomers(max_states=999)
>>> molecule in protomers2
True
I think this means that we can't claim comprehensively in the docstring that the input is or isn't included. I can think of a few ways to solve this:
- Ask for all protomers by default (this assumes OpenEye doesn't have some built in limit)
- Inject the original molecule into the returned protomers if it's missing
- Don't claim to always return the input molecule
I think 3 is the best way forward, 1 may be worth doing (11 is certainly too low a default), and 2 is probably a waste of time.
Apart from that, I just found a few typos in the original docstrings and noticed that SetMaxCount
no longer needs a +1 - those changes are in suggestions below.
Non-blocking: We may also want a test with more protomers than max_states
.
docs/releasehistory.md
Outdated
@@ -12,6 +12,8 @@ Releases follow the `major.minor.micro` scheme recommended by [PEP440](https://w | |||
|
|||
### Behavior changes | |||
|
|||
- [PR #17XX](https://github.com/openforcefield/openff-toolkit/pull/17XX): `Molecule.enumerate_protomers` now includes the input molecule in the returned list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [PR #17XX](https://github.com/openforcefield/openff-toolkit/pull/17XX): `Molecule.enumerate_protomers` now includes the input molecule in the returned list. | |
- [PR #1779](https://github.com/openforcefield/openff-toolkit/pull/1779): `Molecule.enumerate_protomers` now includes the input molecule in the returned list. |
The notebook I was messing around in: protomers.zip The info in the review is probably more legible but it's here for posterity. |
Could you elaborate on why 2 is a waste of time? It's what I intuited the objective to be here. Is the issue that it might return a different number of states if the input molecule happens to be in what was returned by the underlying toolkit? |
I guess the issue is that if the input molecule isn't an important protomer, then when the user asks for the top n protomers and we give them those n plus the original molecule that might be a surprise (just like if they ask for the top n and we give them the top n minus the original). We need to decide what the desired behaviour is I guess. |
Talked a bit at my check-in with @mattwthompson about this. To move this forward, let's do the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent - Thanks @mattwthompson!
assert acid in protomers | ||
|
||
@requires_openeye | ||
def test_bad_state_not_necessarily_in_output(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This is a clever test
Suggestion #1 in #1464 (comment)
max_states
finds as many states as it canmax_states
just returns whatever OpenEye finds0 < max_states < len(openeye_protomers)
returnsmax_states
number of molecules