Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Overhaul protomer enumeration #1779

Merged
merged 20 commits into from
Feb 5, 2024
Merged

Overhaul protomer enumeration #1779

merged 20 commits into from
Feb 5, 2024

Conversation

mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Nov 28, 2023

Suggestion #1 in #1464 (comment)

  • Tag issue being addressed
  • Add tests
    • Test a molecule which is itself not a strong protomer is not returned
    • Test a molecule which is a sensible protomer is returned
    • Test default max_states finds as many states as it can
    • Test too-large max_states just returns whatever OpenEye finds
    • Test that 0 < max_states < len(openeye_protomers) returns max_states number of molecules
    • Test some large molecule returns a bunch of protomers
  • Update docstrings/documentation, if applicable
  • Lint codebase
  • Update changelog
    • Document that input molecule cannot be counted on as returned

Copy link

codecov bot commented Nov 28, 2023

Codecov Report

Merging #1779 (84a5818) into main (34f666c) will decrease coverage by 0.21%.
Report is 3 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files

@mattwthompson mattwthompson marked this pull request as ready for review November 30, 2023 15:59
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I think this is a slightly more intuitive behavior.

Blocking: Works great for molecules with a small number of protomers, but I found that most of the tetracarboxylic acids I tried did not return the original molecule (including the only apparently real tetracarboxylic acid I could find in a quick search - SMILES "C(C(=O)O)(C(=O)O)=C(C(=O)O)(C(=O)O)").

>>> from openff.toolkit import Molecule
>>> molecule = Molecule.from_smiles("C(C(=O)O)(C(=O)O)=C(C(=O)O)(C(=O)O)")
>>> protomers = molecule.enumerate_protomers()
>>> molecule in protomers
False

I tracked this down to being because we by default only ask for the first 11 protomers:

>>> protomers2 = molecule.enumerate_protomers(max_states=999)
>>> molecule in protomers2
True

I think this means that we can't claim comprehensively in the docstring that the input is or isn't included. I can think of a few ways to solve this:

  1. Ask for all protomers by default (this assumes OpenEye doesn't have some built in limit)
  2. Inject the original molecule into the returned protomers if it's missing
  3. Don't claim to always return the input molecule

I think 3 is the best way forward, 1 may be worth doing (11 is certainly too low a default), and 2 is probably a waste of time.

Apart from that, I just found a few typos in the original docstrings and noticed that SetMaxCount no longer needs a +1 - those changes are in suggestions below.

Non-blocking: We may also want a test with more protomers than max_states.

@@ -12,6 +12,8 @@ Releases follow the `major.minor.micro` scheme recommended by [PEP440](https://w

### Behavior changes

- [PR #17XX](https://github.com/openforcefield/openff-toolkit/pull/17XX): `Molecule.enumerate_protomers` now includes the input molecule in the returned list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [PR #17XX](https://github.com/openforcefield/openff-toolkit/pull/17XX): `Molecule.enumerate_protomers` now includes the input molecule in the returned list.
- [PR #1779](https://github.com/openforcefield/openff-toolkit/pull/1779): `Molecule.enumerate_protomers` now includes the input molecule in the returned list.

@Yoshanuikabundi
Copy link
Collaborator

Yoshanuikabundi commented Dec 20, 2023

The notebook I was messing around in: protomers.zip

The info in the review is probably more legible but it's here for posterity.

@mattwthompson
Copy link
Member Author

Could you elaborate on why 2 is a waste of time? It's what I intuited the objective to be here. Is the issue that it might return a different number of states if the input molecule happens to be in what was returned by the underlying toolkit?

@mattwthompson mattwthompson marked this pull request as draft December 21, 2023 20:37
@mattwthompson mattwthompson changed the title Enumerate input molecule with protomers Overhaul protomer enumeration Jan 2, 2024
@Yoshanuikabundi
Copy link
Collaborator

I guess the issue is that if the input molecule isn't an important protomer, then when the user asks for the top n protomers and we give them those n plus the original molecule that might be a surprise (just like if they ask for the top n and we give them the top n minus the original). We need to decide what the desired behaviour is I guess.

@j-wags
Copy link
Member

j-wags commented Jan 5, 2024

Talked a bit at my check-in with @mattwthompson about this. To move this forward, let's do the following:

  • Set the default max_states to 0
  • Not manually re-add the input to the list of outputs
  • Change the docstring to say that, if max_states is limited (set to something other than 0) the input molecule may not be returned

@mattwthompson mattwthompson marked this pull request as ready for review January 17, 2024 22:45
@j-wags j-wags requested review from j-wags and removed request for Yoshanuikabundi January 24, 2024 21:02
Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent - Thanks @mattwthompson!

assert acid in protomers

@requires_openeye
def test_bad_state_not_necessarily_in_output(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This is a clever test

@j-wags j-wags merged commit b08243e into main Feb 5, 2024
18 checks passed
@j-wags j-wags deleted the enumerate-all-protomers branch February 5, 2024 21:16
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants