Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update aa_residues_substructures_explicit_bond_orders_with_caps_expli… #1728

Conversation

pbuslaev
Copy link
Contributor

This patch should solve #1727

@codecov
Copy link

codecov bot commented Sep 22, 2023

Codecov Report

Merging #1728 (7c37650) into main (88ce0b3) will decrease coverage by 11.74%.
The diff coverage is n/a.

Additional details and impacted files

Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting on the fix for this!

Patching that file will only solve the problem temporarily, since we periodically regenerate it from the chemical components dictionary using this script. Also, I'm seeing CYS residues in our current dictionary with a negative sulfur. So either there's something wrong with a existing substructure in this file that's causing it not to match the deprotonated CYS in the input, or the mainchain substructure is fully missing.

It'd be great if we could solve the problem by substituting an existing substructure in the _patch_known_problems function, but barring that, just adding a new working substructure in the _add_common_substructures function would be fine.

Updated cif to substructure to generate CYM substructure
RDKitToolkitWrapper()._validate_custom_substructures() actually requires Dict[str, List[str]]
…cit_connectivity.json

Updated list of aa_residues_substructures obtained with new code
Fixed typo in the _validate_custom_substructures docstring
@pbuslaev
Copy link
Contributor Author

I think that the changes I added should lead to generation of the correct json file. I have also updated the json file and fixed a typo in rdkit_wrapper docstring

@j-wags
Copy link
Member

j-wags commented Sep 26, 2023

Wow - Thanks for diving into our crazy substructure code! The resulting patterns look good visually but I tried loading the PDB file from your original post and it failed with:

E           openff.toolkit.utils.exceptions.UnassignedChemistryInPDBError: Some bonds or atoms in the input could not be identified.
E           
E           Hint: The following residues were assigned names that do not match the residue name in the input, or could not be assigned residue names at all. This may indicate that atoms are missing from the input or some other error. The OpenFF Toolkit requires all atoms, including hydrogens, to be explicit in the input to avoid ambiguities in protonation state or bond order:
E               Input residue  :CYS#0002 contains atoms matching substructures {'No match', 'PEPTIDE_BOND'}
E           
E           Error: The following 5 atoms exist in the input but could not be assigned chemical information from the substructure library:
E               Atom     9 (HA) in residue  :CYS#0002
E               Atom    10 (CB) in residue  :CYS#0002
E               Atom    11 (HB2) in residue  :CYS#0002
E               Atom    12 (HB3) in residue  :CYS#0002
E               Atom    13 (SG) in residue  :CYS#0002

openff/toolkit/utils/rdkit_wrapper.py:818: UnassignedChemistryInPDBError

I tried shaking a few permutations of things like residue names, but to no avail. Are you able to load the original PDB successfully?

In either case, I have an example file and test we can hammer on, but I don't have write access to your fork to send it over. Would you be OK if I merged this into a branch so OpenEye CI stops complaining, and then gave you write access?

@pbuslaev
Copy link
Contributor Author

Wow - Thanks for diving into our crazy substructure code! The resulting patterns look good visually but I tried loading the PDB file from your original post and it failed with:

E           openff.toolkit.utils.exceptions.UnassignedChemistryInPDBError: Some bonds or atoms in the input could not be identified.
E           
E           Hint: The following residues were assigned names that do not match the residue name in the input, or could not be assigned residue names at all. This may indicate that atoms are missing from the input or some other error. The OpenFF Toolkit requires all atoms, including hydrogens, to be explicit in the input to avoid ambiguities in protonation state or bond order:
E               Input residue  :CYS#0002 contains atoms matching substructures {'No match', 'PEPTIDE_BOND'}
E           
E           Error: The following 5 atoms exist in the input but could not be assigned chemical information from the substructure library:
E               Atom     9 (HA) in residue  :CYS#0002
E               Atom    10 (CB) in residue  :CYS#0002
E               Atom    11 (HB2) in residue  :CYS#0002
E               Atom    12 (HB3) in residue  :CYS#0002
E               Atom    13 (SG) in residue  :CYS#0002

openff/toolkit/utils/rdkit_wrapper.py:818: UnassignedChemistryInPDBError

I tried shaking a few permutations of things like residue names, but to no avail. Are you able to load the original PDB successfully?

This is strange. With updated json substructure file everything works perfectly for me. I do see the exact same SMARTS pattern as I added manually in the generated substructure file, so I am a bit confused why you are getting the error.

In either case, I have an example file and test we can hammer on, but I don't have write access to your fork to send it over. Would you be OK if I merged this into a branch so OpenEye CI stops complaining, and then gave you write access?

Yes, this is fine with me.

@j-wags j-wags changed the base branch from main to patch-negatively-charged-cysteine September 26, 2023 17:33
@j-wags j-wags merged commit 8713f90 into openforcefield:patch-negatively-charged-cysteine Sep 26, 2023
j-wags added a commit that referenced this pull request Sep 28, 2023
#1729)

* Update aa_residues_substructures_explicit_bond_orders_with_caps_expli… (#1728)

* Update aa_residues_substructures_explicit_bond_orders_with_caps_explicit_connectivity.json

* Update _cif_to_substructure_dict.py

Updated cif to substructure to generate CYM substructure

* Update _cif_to_substructure_dict.py

RDKitToolkitWrapper()._validate_custom_substructures() actually requires Dict[str, List[str]]

* Update aa_residues_substructures_explicit_bond_orders_with_caps_explicit_connectivity.json

Updated list of aa_residues_substructures obtained with new code

* Update rdkit_wrapper.py

Fixed typo in the _validate_custom_substructures docstring

* add cym loading test and input

* update releasehistory

* fix cym test

* regenerate all substructure jsons

---------

Co-authored-by: pbuslaev <pbuslaev@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants