mobley_3323117 (sulfolane) has non-standard SMILES #51

jchodera · 2022-12-10T00:00:51Z

Molecule mobley_3323117 (sulfolane) is written with the non-standard SMILES C1CC[S+2](C1)([O-])[O-], rather than the more standard C1CCS(=O)(=O)C1.

Despite being equivalent in total charge, these forms are inequivalent due to the provided formal charges (+2 for S, -1 for O) vs the standard SMILES (all atoms have 0 formal charge), which are rendered inequivalent in molecular representations in the OpenFF toolkit (with the OpenEye backend):

>>> from openff.toolkit.topology import Molecule
>>> freesolv_molecule = Molecule.from_smiles('C1CC[S+2](C1)([O-])[O-]')
>>> standard_molecule = Molecule.from_smiles('C1CCS(=O)(=O)C1')
>>> freesolv_molecule.generate_unique_atom_names()
>>> standard_molecule.generate_unique_atom_names()
>>> [(atom.name, atom.formal_charge.m) for atom in freesolv_molecule.atoms]
[('C1x', 0), ('C2x', 0), ('C3x', 0), ('S1x', 2), ('C4x', 0), ('O1x', -1), ('O2x', -1), ('H1x', 0), ('H2x', 0), ('H3x', 0), ('H4x', 0), ('H5x', 0), ('H6x', 0), ('H7x', 0), ('H8x', 0)]
>>> [(atom.name, atom.formal_charge.m) for atom in standard_molecule.atoms]
[('C1x', 0), ('C2x', 0), ('C3x', 0), ('S1x', 0), ('O1x', 0), ('O2x', 0), ('C4x', 0), ('H1x', 0), ('H2x', 0), ('H3x', 0), ('H4x', 0), ('H5x', 0), ('H6x', 0), ('H7x', 0), ('H8x', 0)]

Would it be reasonable to correct the non-standard SMILES string and re-generate the database?
Or are there ways to automatically standardize the formal charges?

The text was updated successfully, but these errors were encountered:

davidlmobley · 2022-12-10T01:08:14Z

This (or all SMILES) could be canonicalized. Probably the best option, I think, is to fix only THIS SMILES, however. Otherwise we run into the problem of "what should be considered the authoritative identifier for a molecule, from which everything else can be generated?" We've moved to treating SMILES as the source data and authoritative, meaning that re-generating all SMILES from the SMILES is probably unwise, since then we're overwriting our authoritative source data.

So, perhaps we should correct only this one?

lilyminium mentioned this issue Dec 11, 2022

Add normalization code openforcefield/openff-toolkit#1490

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mobley_3323117 (sulfolane) has non-standard SMILES #51

mobley_3323117 (sulfolane) has non-standard SMILES #51

jchodera commented Dec 10, 2022

davidlmobley commented Dec 10, 2022

mobley_3323117 (sulfolane) has non-standard SMILES #51

mobley_3323117 (sulfolane) has non-standard SMILES #51

Comments

jchodera commented Dec 10, 2022

davidlmobley commented Dec 10, 2022