Writing to MOL error #523

jthorton · 2020-02-19T19:23:31Z

Describe the bug
When trying to write molecules to MOL files with RDKit I get an error despite this being one of the supported file formats listed by the wrapper.
To Reproduce
Steps to reproduce the behavior. A minimal reproducing set of python commands is ideal.

from openforcefield.topology import Molecule
from openforcefield.utils.toolkits import RDKitToolkitWrapper

mol = Molecule.from_smiles('CC')
mol.generate_conformers(n_conformers=1)
# try and write to mol file with rdkit
mol.to_file('test.mol', 'mol', RDKitToolkitWrapper())

KeyError: 'MOL'

I think the writer dictionaries here and here just need to be updated. It may also be better to store the reference to these objects in one variable to save repetition of code and make updating writers easier for example

from rdkit import Chem
_toolkit_file_write_formats = {'SDF':Chem.SDWriter , 'MOL':Chem.SDWriter , 'SMI':Chem.SmilesWriter , 'PDB':Chem.PDBWriter, 'TDT': Chem.TDTWriter }

the function to write the files could then be something like

def to_file(self, molecule, file_path, file_format):
        """
        Writes an OpenFF Molecule to a file-like object
        Parameters
        ----------
        molecule : an OpenFF Molecule
            The molecule to write
        file_path
            The file path to write to
        file_format
            The format for writing the molecule data
        Returns
        ------
        """
        from rdkit import Chem
        file_format = file_format.upper()
        with open(file_path, 'w') as file_obj:
            rdmol = self.to_rdkit(molecule)
            try:
                writer = self._toolkit_file_write_formats[file_format](file_obj)
                writer.write(rdmol)
                writer.close()
            except KeyError:
                rasie InvalidFileTypeError(f'The file type {file_format} is not currently supported to be wrote to by RDKit.')

the readers could also be stored this way

from rdkit import Chem
_toolkit_file_read_formats = {'SDF': Chem.SupplierFromFilename, 'MOL': Chem.SupplierFromFilename, 'SMI': Chem.SmilesMolSupplier, 'TDT':Chem.TDTMolSupplier}

Not sure how useful this second option is however as depending on the input different actions are performed this would only be simpler if the functions could be unified in some way.

Computing environment (please complete the following information):

Operating system linux

The text was updated successfully, but these errors were encountered:

j-wags · 2020-02-19T19:29:52Z

It may also be better to store the reference to these objects in one variable to save repetition of code and make updating writers easier for example

I love the idea of defining the reader/writer dict at the class level, rather than twice in the to_file and to_file_obj functions. Part of the reason we never caught the absence of a MOL writer is because of code cleanliness/duplication issues like this.

I'm not sure that I like the try/except logic for finding a writer. For the same level of indentation, we could do if file_format in _toolkit_file_write_formats, and not have to deal with an exception at all.

jaimergp · 2020-02-20T08:47:59Z

The try/except construct will save a key hashing and dict lookup, and most of the time the key will exist. Pythonistas usually prefer EAFP than LBYL. Not that it matters in practice if this is not a tight loop.

jthorton · 2020-02-20T10:05:40Z

Glad you like the idea I think it would definitely help keep the functions clean and make adding more in future very easy as you would only have to extend the dictionary.

I would also vote for the try/except block here as asking for permission in most use cases for this function will be a waste of time.

j-wags · 2020-02-20T17:49:23Z

Wow, I just read a little into LBYL and EAFP and this is a serious debate. Even Guido chimed in.

I don't think a 2x speedup is of serious concern in this specific situation. Between now and the heat death of the universe, using LBYL throughout the OFFTK ecosystem may cumulatively require several additional cpu-seconds.

On the other hand, using try/excepts liberally throughout OFFTK runs the risk of diverting developer attention from ToolkitWrapper try/excepts (which will become complex and nasty as we find more failure modes). Developer time is much more expensive than CPU seconds, and if misinterpreting a try/except causes an OFF developer to take 15 more minutes to debug a problem, that's a net loss for the universe.

Anyway, that's just my two cents. I see the points of view on the other side, and would be happy to take a vote among OFF devs and agree to use one convention or another throughout the ecosystem.

jthorton · 2020-02-21T12:36:37Z

Yeah, this is quite a big thing in python letting the flow beeing controlled with exceptions since they are designed to be cheap. My personal preference is to use exceptions but like you say this can be an issue for bugs if not used precisely, happy to do what the general consensus is, however. For now, shall I close this with my PR for XYZ file support, ill only change the writers for now as the readers are a little more complicated due to the extra functions called.

j-wags added bug effort:low priority:medium labels Feb 19, 2020

jthorton mentioned this issue Feb 21, 2020

[WIP] Multiframe xyz #527

Closed

4 tasks

jthorton pushed a commit that referenced this issue Feb 26, 2020

adding tests for xyz support and add Fix for #523

5d3c1cf

jthorton mentioned this issue Feb 26, 2020

Mol fix and xyz #529

Merged

4 tasks

jthorton closed this as completed in #529 Mar 3, 2020

trevorgokey pushed a commit that referenced this issue Apr 1, 2020

adding tests for xyz support and add Fix for #523

ec220ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing to MOL error #523

Writing to MOL error #523

jthorton commented Feb 19, 2020

j-wags commented Feb 19, 2020

jaimergp commented Feb 20, 2020

jthorton commented Feb 20, 2020

j-wags commented Feb 20, 2020

jthorton commented Feb 21, 2020

Writing to MOL error #523

Writing to MOL error #523

Comments

jthorton commented Feb 19, 2020

j-wags commented Feb 19, 2020

jaimergp commented Feb 20, 2020

jthorton commented Feb 20, 2020

j-wags commented Feb 20, 2020

jthorton commented Feb 21, 2020