Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cmiles options #535

Merged
merged 11 commits into from
Mar 31, 2020
Merged

Cmiles options #535

merged 11 commits into from
Mar 31, 2020

Conversation

jthorton
Copy link
Collaborator

@jthorton jthorton commented Mar 4, 2020

This PR will expand the to_smiles API options to allow the creation of cmiles identifiers.

@codecov-io
Copy link

codecov-io commented Mar 4, 2020

Codecov Report

Merging #535 into master will increase coverage by 0.01%.
The diff coverage is 96.87%.

@jthorton
Copy link
Collaborator Author

jthorton commented Mar 9, 2020

In order to be able to generate the unique index for a molecule who will be submitted for torsion driving, I also need to add a way to generate a mapped smile for just the atoms involved in the torsion as the current method will only generate the mapped smiles for the whole molecule.

@jthorton jthorton requested a review from j-wags March 11, 2020 20:57
…rder. Also removed the ordering operation from the to_qcschema method.
@jthorton
Copy link
Collaborator Author

jthorton commented Mar 12, 2020

In testing with the automated qcarchive submission pipeline, I found that the canonical ordering would often fail on molecules with undefined stereo even after users have allowed this so I have also fixed this issue.

I have also removed the automatic ordering of a molecule when using the to_qcschema method as this makes it difficult to generate the cmiles information in the correct order. Now the method just gives a schema representation of the exact molecule it is called on, to get the molecule in an canonical order for schema the canonical order method must be called first like in the follwoing example which will be part of the automated submission pipeline:

ethanol = Molecule.from_smiles('CCO')
ethanol.generate_conformers()

# canonical order the atoms
ordered_ethanol = ethanol.canonical_order_atoms()

# now make the schema and the cmiles tags for this ordering
schema = ordered_ethanol.to_qcschema()
mapped_hydrogen_smiles = ordered_ethanol.to_smiles(mapped=True)

Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Requested a few changes, mostly spelling things but some larger requests. Overall I'm excited to bring this functionality in.

I think we're doing the right thing with the atom mapping -- We support it for the general case (all atoms mapped), and have semi-hidden functionality for our "expert case" while we work on our final spec for atom maps. I'd just add the "hidden functionality" to the to_smiles docstrings for now (basically,a quick primer on how offmol.properties['atom_map'] is interpreted if present).

And, of course, update the release notes :-)


if toolkit_class.is_available():
toolkit = toolkit_class()
mol = Molecule.from_smiles(data['molecule_input'], toolkit_registry=toolkit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Blocking) Generating the molecules from SMILES gives this test two ways to fail. Either:

  1. The test actually fails for some reason that we're testing for
  2. RDKitToolkitWrapper.from_smiles indexes molecules differently one day, and we have to do a lot of work overhauling all of the tests that include it

I'd like to reduce the number of tests that could hit option 2. So, I know it's a pain, but could we make the molecules for this test using the Molecule API, to ensure they use a fixed indexing system?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you mean but I am not sure that would fix the problem as if the canonicalisation method changes all tests will fail regardless of input ordering and the expected results will have to be updated., I think this is always going to be a problem when explicitly testing the output smiles strings. I think with a little more work I can be cleaver about the testing and get around this Ill draft it up and see what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your new solution. API-produced molecules are ugly to implement, but will save us a ton of time in the long run.

@jthorton jthorton requested a review from j-wags March 26, 2020 18:23
Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, Josh. I like the changes you made, and think this is good to merge. The only thing that I might still change is changing the manually-provided atom map to always expect to start indexing at 1. Though, I haven't thought about this as much as you, so feel free to got forward as-is if there's an underlying reason that it might start at 0.

@jthorton
Copy link
Collaborator Author

Thanks Jeff Ill get it merged in and yeah I know what you mean that is an awkward point, for now, they should be able to start from 0 or 1 but if we finish the spec and say 1 only we can make that change. My other thought is that to be consistent with the atom map returned from Molecule.are_isomorphic which starts from 0 so if we do change we should change all of these functions to make sure they take the same types of atom maps.

@jthorton jthorton merged commit af7c2f8 into master Mar 31, 2020
@jthorton jthorton deleted the more_smiles_options branch March 31, 2020 16:05
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand to_smiles API
4 participants