Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Writing to MOL error #523

Closed
jthorton opened this issue Feb 19, 2020 · 5 comments · Fixed by #529
Closed

Writing to MOL error #523

jthorton opened this issue Feb 19, 2020 · 5 comments · Fixed by #529

Comments

@jthorton
Copy link
Collaborator

Describe the bug
When trying to write molecules to MOL files with RDKit I get an error despite this being one of the supported file formats listed by the wrapper.
To Reproduce
Steps to reproduce the behavior. A minimal reproducing set of python commands is ideal.

from openforcefield.topology import Molecule
from openforcefield.utils.toolkits import RDKitToolkitWrapper

mol = Molecule.from_smiles('CC')
mol.generate_conformers(n_conformers=1)
# try and write to mol file with rdkit
mol.to_file('test.mol', 'mol', RDKitToolkitWrapper())

KeyError: 'MOL'

I think the writer dictionaries here and here just need to be updated. It may also be better to store the reference to these objects in one variable to save repetition of code and make updating writers easier for example

from rdkit import Chem
_toolkit_file_write_formats = {'SDF':Chem.SDWriter , 'MOL':Chem.SDWriter , 'SMI':Chem.SmilesWriter , 'PDB':Chem.PDBWriter, 'TDT': Chem.TDTWriter }

the function to write the files could then be something like

def to_file(self, molecule, file_path, file_format):
        """
        Writes an OpenFF Molecule to a file-like object
        Parameters
        ----------
        molecule : an OpenFF Molecule
            The molecule to write
        file_path
            The file path to write to
        file_format
            The format for writing the molecule data
        Returns
        ------
        """
        from rdkit import Chem
        file_format = file_format.upper()
        with open(file_path, 'w') as file_obj:
            rdmol = self.to_rdkit(molecule)
            try:
                writer = self._toolkit_file_write_formats[file_format](file_obj)
                writer.write(rdmol)
                writer.close()
            except KeyError:
                rasie InvalidFileTypeError(f'The file type {file_format} is not currently supported to be wrote to by RDKit.')

the readers could also be stored this way

from rdkit import Chem
_toolkit_file_read_formats = {'SDF': Chem.SupplierFromFilename, 'MOL': Chem.SupplierFromFilename, 'SMI': Chem.SmilesMolSupplier, 'TDT':Chem.TDTMolSupplier}

Not sure how useful this second option is however as depending on the input different actions are performed this would only be simpler if the functions could be unified in some way.

Computing environment (please complete the following information):

  • Operating system linux
@j-wags
Copy link
Member

j-wags commented Feb 19, 2020

It may also be better to store the reference to these objects in one variable to save repetition of code and make updating writers easier for example

I love the idea of defining the reader/writer dict at the class level, rather than twice in the to_file and to_file_obj functions. Part of the reason we never caught the absence of a MOL writer is because of code cleanliness/duplication issues like this.

I'm not sure that I like the try/except logic for finding a writer. For the same level of indentation, we could do if file_format in _toolkit_file_write_formats, and not have to deal with an exception at all.

@jaimergp
Copy link
Contributor

The try/except construct will save a key hashing and dict lookup, and most of the time the key will exist. Pythonistas usually prefer EAFP than LBYL. Not that it matters in practice if this is not a tight loop.

@jthorton
Copy link
Collaborator Author

Glad you like the idea I think it would definitely help keep the functions clean and make adding more in future very easy as you would only have to extend the dictionary.

I would also vote for the try/except block here as asking for permission in most use cases for this function will be a waste of time.

@j-wags
Copy link
Member

j-wags commented Feb 20, 2020

Wow, I just read a little into LBYL and EAFP and this is a serious debate. Even Guido chimed in.

I don't think a 2x speedup is of serious concern in this specific situation. Between now and the heat death of the universe, using LBYL throughout the OFFTK ecosystem may cumulatively require several additional cpu-seconds.

On the other hand, using try/excepts liberally throughout OFFTK runs the risk of diverting developer attention from ToolkitWrapper try/excepts (which will become complex and nasty as we find more failure modes). Developer time is much more expensive than CPU seconds, and if misinterpreting a try/except causes an OFF developer to take 15 more minutes to debug a problem, that's a net loss for the universe.

Anyway, that's just my two cents. I see the points of view on the other side, and would be happy to take a vote among OFF devs and agree to use one convention or another throughout the ecosystem.

@jthorton
Copy link
Collaborator Author

Yeah, this is quite a big thing in python letting the flow beeing controlled with exceptions since they are designed to be cheap. My personal preference is to use exceptions but like you say this can be an issue for bugs if not used precisely, happy to do what the general consensus is, however. For now, shall I close this with my PR for XYZ file support, ill only change the writers for now as the readers are a little more complicated due to the extra functions called.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
3 participants