Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[WIP] Molecule.from_pdb #1105

Merged
merged 29 commits into from
Oct 27, 2021
Merged

Conversation

j-wags
Copy link
Member

@j-wags j-wags commented Oct 7, 2021

  • Prototype implementation of reading a molecule from a PDB consisting of entirely "standard" components and assigning bond orders and formal charges from aa_variants.cif (and some manually-added caps)
  • Fix issue where our peptide bond SMARTS could mislabel PRO ring-closing carbon as CA (maybe go over metadata assignment substructures in some sort of order that puts peptide+disulfide bonds last, and make earlier matches take precedence)
  • Code cleanup
  • More extensive testing
  • Ensure that perceive_residues still works with newly-generated substructure files
  • Resolve whether the same file can be used for BOTH metadata assignment (perceive_residues) AND bond order/formal charge assignment (from_pdb)
    • It can't -- The atom-naming dict needs to remove imidazole+guanidinium bond orders and formal charges to catch resonance states. The chemistry-assigning dict needs to know that exact information.
  • Make stereo warnings quiet (maybe don't do from_rdkit at all and just make offmol directly in OFFTK API?)
    • Loading process now ingests coordinates and perceives stereo from that.
  • Remove direct rdkit dependence from molecule.py
  • Add extensive tests for PDB loading (take PDBs from PLBenchmarks, COVID moonshot?, other sources?)
  • Add logic to gracefully error for unrecognized residues in PDB (like TPO)
  • Add API points to support adding additional residues to the substructure dict
  • Handle loading PDBs with multiple components (even if they're a mix of polymer and small molecules) (do we really want to support this? It's easy to split PDBs)
  • Offer API point for residue substructure perception when loading OpenMM topologies
  • Fill residue_name, residue_number metadata
  • Fill atom_name metadata
  • Add tests for metadata loading
  • Test roundtrips to/from PDB (Guarantee that we keep atom name, atom order, atom number(maybe not this one, if original indexing doesn't start at 0?), atom name, residue name, residue number, chain name, element, coordinates)
  • Make sure perceive_residues assigns appropriate residue numbers at disulfide bridges (numbering should increment at each peptide bond, NOT jump disulfide bonds)
  • Schedule meeting with RCSB to talk about ways to improve aa-variants file

@lgtm-com
Copy link

lgtm-com bot commented Oct 7, 2021

This pull request introduces 4 alerts when merging b17ede8 into 118c652 - view on LGTM.com

new alerts:

  • 3 for Unused local variable
  • 1 for Unreachable code

@lgtm-com
Copy link

lgtm-com bot commented Oct 13, 2021

This pull request introduces 4 alerts when merging 5e5f42a into 321782a - view on LGTM.com

new alerts:

  • 3 for Unused local variable
  • 1 for Unreachable code

… JW's previous work, raise error for unspec bond order in from_rdkit
@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2021

This pull request introduces 4 alerts when merging 930d909 into 321782a - view on LGTM.com

new alerts:

  • 3 for Unused local variable
  • 1 for Unreachable code

@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2021

This pull request introduces 4 alerts when merging bbb1e5d into 321782a - view on LGTM.com

new alerts:

  • 3 for Unused local variable
  • 1 for Unreachable code

@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2021

This pull request introduces 2 alerts and fixes 1 when merging 7554a15 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unreachable code

fixed alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2021

This pull request introduces 2 alerts and fixes 1 when merging 17a8ef7 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unreachable code

fixed alerts:

  • 1 for Unused local variable

…s for TRP, splitting out monkey patches from manually-added substructures
@lgtm-com
Copy link

lgtm-com bot commented Oct 21, 2021

This pull request introduces 2 alerts and fixes 1 when merging 7436972 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unreachable code

fixed alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 26, 2021

This pull request introduces 2 alerts and fixes 1 when merging 354cc76 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unreachable code

fixed alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2021

This pull request introduces 1 alert and fixes 1 when merging 6f6de48 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable

fixed alerts:

  • 1 for Unused local variable

@j-wags
Copy link
Member Author

j-wags commented Oct 27, 2021

Ok, I'd like to put this forward for review (happy to go over this live during our working session tomorrow @ijpulidos). There's still more to do, but this is a pretty cool prototype - It loads 11 out of 13 proteins from PLBenchmark. One failure is due to a phosphotyrosine, and I'm not sure what's up with the other. And actually, many of the pdbs that it loads should have failed, since they contained more than one molecule (like, the protein had gaps so it was many chemical molecules). So I've added a to-do to handle that more nicely, but I don't think it's blocking for the prototype.

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2021

This pull request introduces 1 alert and fixes 1 when merging fdec5a9 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable

fixed alerts:

  • 1 for Unused local variable

@mattwthompson
Copy link
Member

Could topology-biopolymer-refactor be merged back in here at some point? I need at least one feature (Topology.copy_initializer) that's not in here in order to load and use PDB files, but I don't want to break something you're currently working on.

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2021

This pull request introduces 1 alert and fixes 1 when merging c713129 into 321782a - view on LGTM.com

new alerts:

  • 1 for Unused local variable

fixed alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 27, 2021

This pull request fixes 1 alert when merging 1e2ff46 into 321782a - view on LGTM.com

fixed alerts:

  • 1 for Unused local variable

@j-wags
Copy link
Member Author

j-wags commented Oct 27, 2021

(conclusion from working session with @ijpulidos) - I'll be merging this later today (assuming that tests pass). This gets over the finish line for an RDKit-only prototype. We have lots more to do, but I think this will handle a lot of common use cases and unblock people who want to start testing.

@j-wags j-wags merged commit a0f9d0d into topology-biopolymer-refactor Oct 27, 2021
@j-wags j-wags deleted the molecule-from-pdb-proto branch October 27, 2021 19:18
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants