Code for "Self-supervised machine learning methods for protein design improve sampling, but not the identification of high-fitness variants"

This repo contains the code for reproducing the results of the publication "Self-supervised machine learning methods for protein design improve sampling, but not the identification of high-fitness variants" (link to preprint). If you are interested instead in using the implemented features for your own work, an overview of them can be found in the Rosetta documentation here, and a tutorial is available from the Meiler Rosetta workshop 2023 "Tutorial 2: Machine Learning in Rosetta".

Running the different design protocols

Code for running the different design protocols can be found in the folder of each dataset, e.g. emi/avg03.sh. All scripts use the RosettaScripts XML provided in the main folder which are named after the different protocols shown in the paper.

Sequences and metrics of resulting designs

The unique sequences and calculated metrics of each design protocol are available in the dataset folders ("dataset/dataset_designs.csv"), e.g. emi/emi_designs.csv.

Analysis of designs

The code for analyzing the resulting designs and reproducing figures can be found in the design_analysis.ipynb notebook. In order to run the jupyter notebooks, first create a python environment using the environment.yaml file with either conda or mamba:

# create environment
conda env create -f environment.yaml
# activate environment
conda activate probs_design

Oracle model data and training

The code for training and evaluating the oracle models for each dataset can be found in the model_training.ipynb notebook. The datasets used for training can be found in each dataset folder, e.g. gb1/gb1_mutations_full_data.csv. The already trained models are also available, e.g. gb1/gb1_ridge.joblib.

Rosetta code

The Rosetta source code can be found at https://github.com/RosettaCommons/rosetta/. Docker containers for Rosetta (including the Tensorflow/LibTorch extras version) can be found at https://hub.docker.com/r/rosettacommons/rosetta.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
emi		emi
gb1		gb1
gfp		gfp
herceptin		herceptin
CalcMetrics.xml		CalcMetrics.xml
IteratedConvergence.xml		IteratedConvergence.xml
IteratedConvergence_GFP.xml		IteratedConvergence_GFP.xml
README.md		README.md
RelaxSimpleDesign_gfp.xml		RelaxSimpleDesign_gfp.xml
SimpleDesign.xml		SimpleDesign.xml
calc_metrics.sh		calc_metrics.sh
converge_it.sh		converge_it.sh
converge_it_gfp.sh		converge_it_gfp.sh
design.options		design.options
design_analysis.ipynb		design_analysis.ipynb
design_analysis_known_mutations_only.ipynb		design_analysis_known_mutations_only.ipynb
environment.yaml		environment.yaml
model_training.ipynb		model_training.ipynb
simple_design.sh		simple_design.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for "Self-supervised machine learning methods for protein design improve sampling, but not the identification of high-fitness variants"

Running the different design protocols

Sequences and metrics of resulting designs

Analysis of designs

Oracle model data and training

Rosetta code

About

Releases 1

Packages

Languages

meilerlab/probabilities_design

Folders and files

Latest commit

History

Repository files navigation

Code for "Self-supervised machine learning methods for protein design improve sampling, but not the identification of high-fitness variants"

Running the different design protocols

Sequences and metrics of resulting designs

Analysis of designs

Oracle model data and training

Rosetta code

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages