-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cmiles options #535
Cmiles options #535
Conversation
In order to be able to generate the unique index for a molecule who will be submitted for torsion driving, I also need to add a way to generate a mapped smile for just the atoms involved in the torsion as the current method will only generate the mapped smiles for the whole molecule. |
…rder. Also removed the ordering operation from the to_qcschema method.
In testing with the automated qcarchive submission pipeline, I found that the canonical ordering would often fail on molecules with undefined stereo even after users have allowed this so I have also fixed this issue. I have also removed the automatic ordering of a molecule when using the ethanol = Molecule.from_smiles('CCO')
ethanol.generate_conformers()
# canonical order the atoms
ordered_ethanol = ethanol.canonical_order_atoms()
# now make the schema and the cmiles tags for this ordering
schema = ordered_ethanol.to_qcschema()
mapped_hydrogen_smiles = ordered_ethanol.to_smiles(mapped=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Requested a few changes, mostly spelling things but some larger requests. Overall I'm excited to bring this functionality in.
I think we're doing the right thing with the atom mapping -- We support it for the general case (all atoms mapped), and have semi-hidden functionality for our "expert case" while we work on our final spec for atom maps. I'd just add the "hidden functionality" to the to_smiles
docstrings for now (basically,a quick primer on how offmol.properties['atom_map']
is interpreted if present).
And, of course, update the release notes :-)
|
||
if toolkit_class.is_available(): | ||
toolkit = toolkit_class() | ||
mol = Molecule.from_smiles(data['molecule_input'], toolkit_registry=toolkit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Blocking) Generating the molecules from SMILES gives this test two ways to fail. Either:
- The test actually fails for some reason that we're testing for
RDKitToolkitWrapper.from_smiles
indexes molecules differently one day, and we have to do a lot of work overhauling all of the tests that include it
I'd like to reduce the number of tests that could hit option 2. So, I know it's a pain, but could we make the molecules for this test using the Molecule API, to ensure they use a fixed indexing system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean but I am not sure that would fix the problem as if the canonicalisation method changes all tests will fail regardless of input ordering and the expected results will have to be updated., I think this is always going to be a problem when explicitly testing the output smiles strings. I think with a little more work I can be cleaver about the testing and get around this Ill draft it up and see what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your new solution. API-produced molecules are ugly to implement, but will save us a ton of time in the long run.
…matching, added atom map description to doc strings.
…ield into more_smiles_options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, Josh. I like the changes you made, and think this is good to merge. The only thing that I might still change is changing the manually-provided atom map to always expect to start indexing at 1. Though, I haven't thought about this as much as you, so feel free to got forward as-is if there's an underlying reason that it might start at 0.
…ield into more_smiles_options
Thanks Jeff Ill get it merged in and yeah I know what you mean that is an awkward point, for now, they should be able to start from 0 or 1 but if we finish the spec and say 1 only we can make that change. My other thought is that to be consistent with the atom map returned from |
This PR will expand the
to_smiles
API options to allow the creation of cmiles identifiers.