CS541 Final Group Project

by Alexsandra Antoski, Kai Davidson, and Peter Howell

Setup

We use conda to manage our project environment.

git clone https://github.com/HySonLab/Protein_Pretrain.git
cd multimodal-protein-ae

If you don't already have the conda environment:

conda env create -f environment.yml

Then run regardless:

conda activate dlprotproj

Finally if you already have this environment created update it with the following, ensuring you have the correct environment active:

conda env update --file environment.yml --prune

If you add new packages use this command to update the environment.yml file:

conda env export | grep -v "^prefix: " > environment.yml

Folder Structure

Our project has the following structure

multimodal-protein-ae
├── README.md
├── data
│   ├── graphs/*.pt
│   ├── pointclouds/*.pt
│   ├── raw-structures/*.[cif | pdb]
│   └── sequences/*.pt
├── models
│   ├── CAE.pt
│   ├── PAE.pt
│   └── VGAE.pt
├── paper
├── mpae
│   ├── __init__.py
│   ├── mpae.py
│   ├── nn
│   │   ├── __init__.py
│   │   ├── attention.py
│   │   ├── concrete_autoencoder.py
│   │   ├── esm.py
│   │   ├── pae.py
│   │   └── vgae.py
│   └── utils
│       ├── __init__.py
│       ├── data.py
│       └── utils.py
└── scripts
    ├── py
    │   ├── fuse_encode.py
    │   ├── get_protein_list.py
    │   ├── make_graphs.py
    │   ├── make_pointclouds.py
    │   ├── make_tokenized_seqs.py
    │   ├── pretrain_pae.py
    │   └── pretrain_vgae.py
    └── sh
        ├── construct-graphs.sh
        ├── download-data.sh
        ├── get-protein-ids.sh
        ├── get-protein-structures.sh
        ├── graph-pretrain.sh
        ├── graph-test.sh
        ├── make-fusion.sh
        ├── make-pointclouds.sh
        └── pointcloud-pretrain.sh

mpae

The mpae directory contains the classes and functions essential for our model.

scripts

scripts has the scripts used to deploy the model.

scripts/py

contains the python scripts that were used to perform the data transformation, train the models, and evaluate them.

scripts/sh

contains the bash scripts used to submit jobs on the slurm cluster and download the data.

data

this directory is where all the data was stored. The raw structure files downloaded from PDB were all placed in data/raw-structures. A graph, tokenized sequence, and point cloud was created from each of raw structure file and stored the respective directories.

models

contains the pretrained models.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.vscode		.vscode
downstreamtasks		downstreamtasks
mpae		mpae
paper		paper
scripts		scripts
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS541 Final Group Project

Setup

Folder Structure

mpae

scripts

scripts/py

scripts/sh

data

models

About

Releases

Packages

Contributors 3

Languages

kalxed/multimodal-protein-ae

Folders and files

Latest commit

History

Repository files navigation

CS541 Final Group Project

Setup

Folder Structure

mpae

scripts

scripts/py

scripts/sh

data

models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages