Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Weird atom typing by LigParGen? #33

Closed
iGulitch opened this issue Nov 19, 2024 · 10 comments
Closed

Weird atom typing by LigParGen? #33

iGulitch opened this issue Nov 19, 2024 · 10 comments

Comments

@iGulitch
Copy link

Hello!

I'm attempting to atomtype HMMM molecule.

LigParGen [locally installed] claims that those nitrogens bonded to the ring carbons have the atom type N. I have serious doubts about it, since, according to the OPLS/2020 Force Field atom type list , N is the atom type of amide nitrogens, and the prerequisite for amide is the double-bonded pair C=O linked to a nitrogen. In HMMM, there is no doubled bonded pair C=O and thus there should be no amides identified.

Incorrect identification of the aforementioned nitrogens leads in turn to the incorrect atom-typing of carbons bonded to nitrogens from the side of oxygens. These 6 carbons are claimed to have the type CT, i.e. alkane carbons. IMHO, this is correct. There are ~216 CT atoms in the OPLS/2020 Force Field supplementary material , and ~11 of them are related to amides. I guess those 6 carbons are recognised as the amide ones that might be wrong.

The discussion above leads to a more general question about the mechanism of atom type identification that is implemented in LigParGen. For instance, foyer does atom typing based on the SMARTS sequences. What does LigParGen use?

@jewettaij
Copy link

jewettaij commented Dec 4, 2024

#34 also seems like it might be an issue with atom typing. EDIT: THAT ISSUE MIGHT BE UNRELATED.

@iGulitch
Copy link
Author

iGulitch commented Dec 6, 2024

#34 also seems like it might be an issue with atom typing.

Looked into #34. Crap, if LPG fails for a trivial ethylene, then I'm not surprised about HMMM result. Yet, I'm curious about the atom typing mechanism, because somehow LPG still works for basically any molecule [ not significantly big ] that I've tried so far!

May I ask what the other molecules are, for which you tried LPG? I saw you mentioned benzoic acid. Anything else?

@jewettaij
Copy link

jewettaij commented Dec 9, 2024

Alas, no. I have only tested ethylene and benzoic acid. (The benzoic acid files created by ligpargen seemed okay, ...but I was only looking at the improper interactions, not the atom types. On that note, I don't know how to get LigParGen to reveal the original OPLS atom type that it's using. I would really like that. Instead, LigParGen returns atom types with names like "C801", which isn't very informative. When I posted, I was actually only using LigParGen to verify that the impropers generated by moltemplate were plausibly correct. But after this, it's difficult to tell what's going on.)

@jewettaij
Copy link

jewettaij commented Dec 9, 2024

"because somehow LPG still works for basically any molecule [ not significantly big ] that I've tried so far!"

I'm relieved to hear that!

@iGulitch
Copy link
Author

iGulitch commented Dec 9, 2024

On that note, I don't know how to get LigParGen to reveal the original OPLS atom type that it's using. I would really like that. Instead, LigParGen returns atom types with names like "C801", which isn't very informative.

AFAIU, "C801" isn't "atom type" [C, CT, CM, NT, CA, ... ] , but "atom class" [opls_801, ...] . See the supplementary 1 of the latest OPLS paper.

Nonetheless, I understand you 100%. If you take a look in the supplementary 2 of the latest OPLS paper, then you find there the following :

800                                        800-899 reserved for QM/MM atoms - do not use - they
899                                        will be read from the Z-matrix file, not from here

LPG seems to perform QuantChem in the background always and has atom classes opls_800-899 reserved for those atoms, because I see only these atom classes opls_800-899. I also checked some polymer molecules, and figured out that when 800-899 isn't enough, then LPG bollows atom classes from "opls_9500" and onwards. At the same time, AFAIU, atom typing shouldn't be dependent on whether QuantChem is performed or not, should it? I asked LPG developers about atom typing mechanism, but they are silent. Looks like, source code digging is the only way to figure it out.

@iGulitch
Copy link
Author

iGulitch commented Dec 9, 2024

"because somehow LPG still works for basically any molecule [ not significantly big ] that I've tried so far!"

I'm relieved to hear that!

Let me clarify, not for literally every molecule, but for the vast majority of them. However, apparently, a careful check now is mandatory for each and every molecules parametrised using LPG.

LPG seems to fail in two cases :

  1. Too big molecule. In this case, QuantChem simulations performed in the background impose limitations on the molecule size. see the dicsussion here
  2. Too complicated SMILES sequence. Since LPG uses RDKit to reconstruct a molecule from scratch, it depends on whether used RDKit commands succeed. Firstly, the way how LPG tries to reconstruct a molecule by default fails quite easy :
molecule = Chem.MolFromSmiles(smile)
molecule = Chem.AddHs(molecule)
done = AllChem.EmbedMolecule(molecule, randomSeed=0xf00d)

Luckily, thanks to the RDKit community on github, a possible solution was suggested . Secondly, when the default method fails, LPG uses another trick that requires pybel package of python, whereas LPG incorrectly imports pybel. Therefore, this other trick sometimes fails to.

Here is my experience with it. I hope it's useful for you.

@jewettaij
Copy link

jewettaij commented Dec 9, 2024

Here is my experience with it. I hope it's useful for you.

Yes it is. Thanks for explaining those atom types. And thanks for clarifying how LigParGen can fail. (It makes sense.)

However, apparently, a careful check now is mandatory for each and every molecules parametrised using LPG.

Rant: Actually, I would recommend being paranoid when using most open-source science software, unless the code was maintained by a professor, or staff-scientist, or a company. (That's especially true if the program depends on files, force-fields, or APIs which are changing often. This applies to my code as well. I myself wrote such a program and I supported it actively for a few years. But when I had a full-time 70+ hrs/week job in a completely unrelated field, I had to deprioritize it.)

Looks like, source code digging is the only way to figure it out.

LigParGen has some BOSS dependencies, so this might make the code tricky to read.

Hopefully Israel will see this thread and respond.

@iGulitch
Copy link
Author

@jewettaij , I wouldn't count on your last statement if I were you. Unfortunately, Israel doesn't reply often, but I'd rather say he does it randomly and occasionally :( I was already just about calling him directly to Upsala University in Sweden by phone :D

@iGulitch
Copy link
Author

LigParGen has some BOSS dependencies, so this might make the code tricky to read.
Any of them is responsible for the atom typing, do you know? I thought BOSS was only for the QM calculations to evaluate atom charges, whereas the atom typing itself was solely made by LigParGen. This is why I was trying to read through the source code of LigParGen.

@Isra3l
Copy link
Owner

Isra3l commented Dec 12, 2024

Hey Everyone,

I just replied to Issue #34, and the same answer applies here. Briefly, BOSS is responsible for atom type determination, so for issues related to atom types, please contact Bill Jorgensen directly. You can find instructions on how to check the original OPLS atom type assignment in my #34 reply.

Additionally, I would modify the statement to: "A careful check IS ALWAYS mandatory for each and every molecule parameterized using ANY SOFTWARE." This applies regardless of whether the code is maintained by a professor, a staff scientist, or a company.

Best,

Israel

@Isra3l Isra3l closed this as completed Dec 12, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants