Usage for structure prediction tasks #726

padix-key · 2024-12-20T13:24:13Z

Biotite is integrated in workflows of many structure prediction models. Hence we could add an example script that serves as loose collection of possible uses of Biotite in this context.

Input topics:

One-hot sequence encoding (using ProteinSequence.code)
Getting the assembly (structure.io.pdbx.get_assembly())
Filtering high-quality structures from AFDB as training samples
Secondary structure as feature (DsspApp, structure.annotate_sse())

Output topics:

Structure superimposition, including methods that are robust with respect to missing residues
Evaluation of pose against ground truth (rmsd(), lddt() (Add support for lDDT computation #699), tm_score() (Implement structural superimposition and TM-score #705))

This list is probably not exhaustive, so if anyone has additional ideas, please add them to the issue!

Notably this script should not run any model itself. It is only about preparing features for a hypothetical model and evaluating the output structure poses (e.g. taken from AlphaFold DB).

The text was updated successfully, but these errors were encountered:

cwognum · 2025-01-27T21:32:25Z

Similar idea, but a bit of a tangent: I think using Biotite for ligand posing tasks (or blind docking) would be useful.

We're currently using OpenStructure's compare-ligand-structures command to get the RMSD, LDDT-PLI and LDDT-LP for evaluating a bunch of protein-ligand co-folding methods. This works, but I would love to switch over to biotite for this.

OpenStructure supports a much broader use case than just scoring (and is thus a heavy dependency), has limited cross-platform support (and can thus be hard to install), and has not been primarily built for a ML audience (e.g. the input is a PDB or CIF file, rather than some Pythonic representation of a system). On the other hand, we've seen AlphaFold3 and various of its replicates reimplement such scores themselves, see e.g. LDDT in AlphaFold, Boltz-1 and OpenFold, but these solutions were implemented specifically for those models and are not as robust as OpenStructure. Having a centralized, more generic solution in Biotite would still be valuable.

After a quick search, I found the following:

rmsd() is already supported in Biotite.
lddt() is not officially included yet in the latest release, but I did notice that it was implemented in #699.

As of now, the following functionality seems to be missing from Biotite:

Symmetry corrections by reordering atoms according to the molecular graph isomorphisms. Since Biotite already depends on networkx, we could use its GraphMatcher, similar to what is done in spyrmsd.
The identification of a binding site by filtering the residues of a receptor based on the minimum distance between all the residue's heavy atoms and all the heavy atoms of the ligand. To be fair: Seems possible already using distance() and some clever filtering.
Chain mapping. The purpose of this one is not yet entirely clear to me, but it seems this is done to align the reference and predicted binding site.

Is this something you would be interested in supporting through Biotite? If so, any thoughts on how to implement this? I would be open to help!

cwognum · 2025-01-29T03:25:38Z

@padix-key I've raised the above proposal in various other groups and there's a need that Biotite could address.

I think I can get some folks together to work on this. If you and the other maintainers agree that these are features that you would like to have in Biotite, I would really appreciate your guidance on how to go about implementing this.

padix-key · 2025-01-29T21:40:30Z

Hi @cwognum, at VantAI we are currently polishing a package that does more or less exactly what you proposed: I performs atom matching between reference and the predicted model (from small molecules to chains) and runs metrics on the matched AtomArrays - for both pure protein and protein-ligand models. We plan to make it openly available as extension package soon, but it will still take a few weeks. It is designed with extensibility in mind, so I would appreciate collaboration then to add further evaluation metrics and improve the atom matching. I will keep you up to date!

cwognum · 2025-01-30T15:21:06Z

@padix-key Cool stuff! Is there any way in which we can help accelerate the release of the package. Like I said, there's a group of folks who would love to see this happen and who are open to contribute. Could it be an idea to open-source it already and have some folks test it prior to the official release and launch?

Croydon-Brixton · 2025-02-01T08:58:11Z

Great to see this discussion (:
Just to weigh in here for further support: Biotite is used extensively at the IPD / Baker lab as well for bio-data wrangling and in ML workflows (dataset preparation & evaluation). So this would be very much of interest to the academic community too. Happy to help out where I can be helpful.

padix-key added the example idea An idea for a new example in the gallery label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage for structure prediction tasks #726

Usage for structure prediction tasks #726

padix-key commented Dec 20, 2024 •

edited

Loading

cwognum commented Jan 27, 2025 •

edited

Loading

cwognum commented Jan 29, 2025

padix-key commented Jan 29, 2025

cwognum commented Jan 30, 2025

Croydon-Brixton commented Feb 1, 2025 •

edited

Loading

Usage for structure prediction tasks #726

Usage for structure prediction tasks #726

Comments

padix-key commented Dec 20, 2024 • edited Loading

cwognum commented Jan 27, 2025 • edited Loading

cwognum commented Jan 29, 2025

padix-key commented Jan 29, 2025

cwognum commented Jan 30, 2025

Croydon-Brixton commented Feb 1, 2025 • edited Loading

padix-key commented Dec 20, 2024 •

edited

Loading

cwognum commented Jan 27, 2025 •

edited

Loading

Croydon-Brixton commented Feb 1, 2025 •

edited

Loading