This work has been officially published, but we will keep updating this repository to keep up with the most advanced researches. If you have any suggestions, feel free to open an issue. You are also very welcomed to contribute.
This article has been selected for the cover image of the corresponding Methods issue
The related tutorial and review manuscript can be referred here: Deep learning in bioinformatics: introduction, application, and perspective in the big data era [PDF]
If you find the tutorial and this repository useful, please cite our manuscript with the following information:
@article{LI2019,
title = "Deep learning in bioinformatics: Introduction, application, and perspective in the big data era",
journal = "Methods",
year = "2019",
issn = "1046-2023",
doi = "https://doi.org/10.1016/j.ymeth.2019.04.008",
url = "http://www.sciencedirect.com/science/article/pii/S1046202318303256",
author = "Yu Li and Chao Huang and Lizhong Ding and Zhongxiao Li and Yijie Pan and Xin Gao",}
Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines.
To facilitate the process, in this repository, we provide eight examples, which cover five research directions, four data types, and a number of deep learning models that people will encounter in Bioinformatics. The five research directions are: sequence analysis, structure prediction and reconstruction, biomolecular property and function prediction, biomedical image processing and diagnosis, biomolecule interaction prediction and systems biology. The four data types are: structured data, 1D sequence data, 2D image or profiling data, graph data. The covered deep learning models are: deep fully connected neural networks, ConvNet, RNN, graph convolutional neural network, ResNet, GAN, VAE.
Here is the overview of the eight examples:
This example shows how to use a neural network to identify enzymes.
- Model: deep fully connected neural network
- Data type: structured data
- Research direction: biomolecular property and function prediction
This example shows how to use the combination of CNN and RNN to predict the non-coding DNA sequence function.
- Model: CNN, RNN
- Data type: 1D sequence data
- Research direction: sequence analysis
This example shows how to use deep learning to predict target gene expression with the landmark gene expression data.
- Model: deep fully connected neural network
- Data type: structured data
- Research direction: biomolecule interaction prediction and systems biology
This example shows how to perform diagnosis with ResNet on the X-ray images.
- Model: ResNet
- Data type: 2D image or profiling data
- Research direction: biomedical image processing and diagnosis
This example shows how to using graph neural network to perform graph embedding and predict protein protein interactions in PPI network.
- Model: graph convolutional neural network
- Data type: graph data
- Research direction: biomolecule interaction prediction and systems biology
This example shows how to perform biological image super resolution with GAN.
- Model: GAN
- Data type: 2D image or profiling data
- Research direction: biomedical image processing and diagnosis
This example shows how to use VAE to reduce the dimensionality of gene expression profile.
- Model: VAE
- Data type: 2D image or profiling data
- Research direction: biomolecule interaction prediction and systems biology
This example shows how to perform RNA-protein binding sites prediction with CNN.
- Model: CNN
- Data type: 1D sequence data
- Research direction: sequence analysis