Skip to content

Multitask learning: protein secondary structure prediction, b-values prediction and solvent-accessibility prediction

Notifications You must be signed in to change notification settings

peace195/multitask-learning-protein-prediction

Repository files navigation

Descriptions

Multitask learning (secondary structure prediction, b-values prediction, solvent-accessibility prediction) can improve the prediction accuracy of protein secondary structure.

  • We have to face with the class imbalance problem
  • "foldername_cv": 5 fold cross validation
  • Distribution of outputs:

alt text

Data

The copyright belongs to http://rostlab.org/. It can not be public.

Data representation

Using Protvec (3-gram) and follow the vector addition rule. For example:

TNCDE = UTN + TNC + NCD + CDE + DEU

Multitask learning model

alt text

Results

3 states protein secondary structure)

Multi-task learning (3 tasks, 3 states):

  • Secondary Structure accuracy (3 states): 69.0%

  • Solvent Accessibility accuracy (3 states): 54.6%

  • B-values accuracy (3 states): 59.1%

8 states protein secondary structure

Multi-task learning (3 tasks, 8 states):

  • Secondary Structure accuracy (8 states): 0.476

  • Solvent Accessibility accuracy (3 states): 0.548

  • B-values accuracy (3 states): 0.598

  • Secondary structure

alt text

  • Solvent accessibility

alt text

  • b-values

alt text

Prerequisites

  • python 2.7
  • tensorflow 1.4.0
  • ProtVec

How to run

Go into each subfolder and run the code following:

  • python lstm.py

Author

Binh Do

License

This project is licensed under the MIT License