This is an AI for Biology reading list maintained by the MBZUAI AI4Bio Group.
Contents:
Note: For applications of diffusion methods in protein science, check Diffusion reading list.
-
[2022.11.17 Pre] Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. [Paper] [Slides]
-
Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021. Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., ... & Baker, D. [Paper]
-
ColabFold: making protein folding accessible to all]. Nature Methods. 2022. Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. [Paper]
-
[2022.12.01 Pre] Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv. 2022. Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. [Paper] [Slides]
-
[2022.12.08 Pre] High-resolution de novo structure prediction from primary sequence. BioRxiv. 2022. Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., ... & Peng, J. [Paper] [Slides]
-
[2022.12.08 Pre] Helixfold-single: Msa-free protein structure prediction by using protein language model as an alternative. ArXiv. 2022. Fang, X., Wang, F., Liu, L., He, J., Lin, D., Xiang, Y., ... & Song, L. [Paper] [Slides]
-
[2023.06.29 Pre] Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning Nature Biotechnology. 2023. [Paper] [Slides]
-
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS. 2021. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R. [Paper]
-
Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. Nature Machine Intelligence. 2019. Upmeier zu Belzen, J., Bürgel, T., Holderbach, S., Bubeck, F., Adam, L., Gandor, C., ... & Eils, R. [Paper]
-
Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nature Machine Intelligence. 2020. Wan, C., & Jones, D. T. [Paper]
-
Protein function prediction for newly sequenced organisms. Nature Machine Intelligence. 2021. Torres, M., Yang, H., Romero, A. E., & Paccanaro, A. [Paper]
-
[2023.07.13 Pre] Enzyme function prediction using contrastive learning. Science. 2023. Yu, T., Cui, H., Li, J. C., Luo, Y., Jiang, G., & Zhao, H. [Paper] [Slides]
-
Expanding functional protein sequence spaces using generative adversarial networks. Nature Machine Intelligence. 2021. Repecka, D., Jauniskis, V., Karpus, L., Rembeza, E., Rokaitis, I., Zrimec, J., ... & Zelezniak, A. [Paper]
-
Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence. 2022. Castro, E., Godavarthi, A., Rubinfien, J., Givechian, K., Bhaskar, D., & Krishnaswamy, S. [Paper]
-
[2023.01.12 Pre] A high-level programming language for generative protein design. bioRxiv. 2022-12. Hie, B., Candido, S., Lin, Z., Kabeli, O., Rao, R., Smetanin, N., ... & Rives, A. [Paper] [Slides]
-
[2023.03.16 Pre] A universal deep-learning model for zinc finger design enables transcription factor reprogramming. Nature Biotechnology. 2023. Ichikawa, D. M., Abdin, O., Alerasool, N., Kogenaru, M., Mueller, A. L., Wen, H., ... & Noyes, M. B. [Paper] [Slides]
-
[2023.07.20 Pre] Large language models generate functional protein sequences across diverse families. Nature Biotechnology. 2023. Madani, A., Krause, B., Greene, E. R., Subramanian, S., Mohr, B. P., Holton, J. M., ... & Naik, N. [Paper][Slides]
-
[2023.08.03 Pre] Top-down design of protein architectures with reinforcement learning Science. 2023. Lutz, I. D., Wang, S., Norn, C., Courbet, A., Borst, A. J., Zhao, Y. T., ... & Baker, D. [Paper] [Slides]
-
Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence. 2020. Zheng, S., Li, Y., Chen, S., Xu, J., & Yang, Y. (2020). [Paper]
-
A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nature Machine Intelligence. 2020. Wang, M., Cang, Z., & Wei, G. W. [Paper]
-
Computed structures of core eukaryotic protein complexes. Science. 2021. Humphreys, I. R., Pei, J., Baek, M., Krishnakumar, A., Anishchenko, I., Ovchinnikov, S., ... & Baker, D. [Paper]
-
Harnessing protein folding neural networks for peptide–protein docking. Nature communications. 2022. Tsaban, T., Varga, J. K., Avraham, O., Ben-Aharon, Z., Khramushin, A., & Schueler-Furman, O. [Paper]
-
Protein complex prediction with AlphaFold-Multimer. BioRxiv. 2022. Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ... & Hassabis, D. [Paper]
-
Improved prediction of protein-protein interactions using AlphaFold2. Nature communications. 2022. Bryant, P., Pozzati, G., & Elofsson, A. [Paper]
-
AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nature communications. 2022. Gao, M., Nakajima An, D., Parks, J. M., & Skolnick, J. [Paper]
-
Uni-Fold Symmetry: harnessing symmetry in folding large protein complexes. bioRxiv. 2022. Li, Z., Yang, S., Liu, X., Chen, W., Wen, H., Shen, F., ... & Zhang, L. [Paper]
-
Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications. 2022. Bryant, P., Pozzati, G., Zhu, W., Shenoy, A., Kundrotas, P., & Elofsson, A. [Paper]
-
Improve the Protein Complex Prediction with Protein Language Models. bioRxiv. 2022. Chen, B., Xie, Z., Xu, J., Qiu, J., Ye, Z., & Tang, J. [Paper]
-
Clustering single-cell RNA-seq data with a model-based deep learning approach. Nature Machine Intelligence. 2019. Tian, T., Wan, J., Song, Q., & Wei, Z. [Paper]
-
An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nature Machine Intelligence. 2020. Wang, L., Nie, R., Yu, Z., Xin, R., Zheng, C., Zhang, Z., ... & Cai, J. [Paper]
-
Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nature Machine Intelligence. 2020. Hu, J., Li, X., Hu, G., Lyu, Y., Susztak, K., & Li, M. [Paper]
-
Simultaneous deep generative modelling and clustering of single-cell genomic data. Nature Machine Intelligence. 2021. Liu, Q., Chen, S., Jiang, R., & Wong, W. H. (2021). [Paper]
-
scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence. 2022. Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., ... & Yao, J. [Paper] [Slides]
-
A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nature Machine Intelligence. 2022. Lakkis, J., Schroeder, A., Su, K., Lee, M. Y., Bashore, A. C., Reilly, M. P., & Li, M. [Paper]
-
Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale. Nature Machine Intelligence. 2022. Yang, M., Yang, Y., Xie, C., Ni, M., Liu, J., Yang, H., ... & Wang, J. [Paper]
-
Interpreting the B-cell receptor repertoire with single-cell gene expression using Benisse. Nature Machine Intelligence. 2022. Zhang, Z., Chang, W. Y., Wang, K., Yang, Y., Wang, X., Yao, C., ... & Wang, T. [Paper]
-
Simultaneous dimensionality reduction and integration for single-cell ATAC-seq data using deep learning. Nature Machine Intelligence. 2022. Kopp, W., Akalin, A., & Ohler, U. [Paper]
-
Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nature Machine Intelligence. 2022. Chen, X., Chen, S., Song, S., Gao, Z., Hou, L., Zhang, X., ... & Jiang, R. [Paper]
-
Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics. 2022. Li, Jiaqi, Jingjing Wang, Peijing Zhang, Renying Wang, Yuqing Mei, Zhongyi Sun, Lijiang Fei et al. [Paper]
-
[2022.12.15 Pre] GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations. BioRxiv. 2022. Roohani, Y., Huang, K., & Leskovec, J. [Paper] [Slides]
-
[2023.01.26 Pre] Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods. 2021. Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., ... & Kelley, D. R. [Paper] [Slides]
-
Compositional perturbation autoencoder for single-cell response modeling. BioRxiv. 2021. Lotfollahi, M., Susmelj, A. K., Donno, C. D., Ji, Y., Ibarra, I. L., Wolf, F. A., Yakubova, N., Theis, F. J., & Lopez-Paz, D. [Paper]
-
Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution. ArXiv. 2022. Hetzel, L., Böhm, S., Kilbertus, N., Günnemann, S., Lotfollahi, M., & Theis, F. [Paper]
-
MultiCPA: Multimodal Compositional Perturbation Autoencoder. BioRxiv. 2022. Inecik, K., Uhlmann, A., Lotfollahi, M., & Theis, F. [Paper]
-
Machine learning for perturbational single-cell omics. Cell Systems. Cell Systems. 2021. Ji, Y., Lotfollahi, M., Wolf, F. A., & Theis, F. J. [Paper]
-
Learning Single-Cell Perturbation Responses using Neural Optimal Transport. BioRxiv. 2021. Bunne, C., Stark, S. G., Gut, G., Castillo, J. S. del, Lehmann, K.-V., Pelkmans, L., Krause, A., & Rätsch, G. [Paper]
-
Transfer learning enables predictions in network biology. Nature. 2023. Theodoris, C. V., Xiao, L., Chopra, A., Chaffin, M. D., Al Sayed, Z. R., Hill, M. C., ... & Ellinor, P. T. [Paper][Slides]
-
scPerturb: Harmonized Single-Cell Perturbation Data. bioRxiv. 2023. Peidli, S., Green, T. D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L. J., Taylor-King, J. P., Marks, D. S., Luna, A., Blüthgen, N., & Sander, C. [Paper]
-
SERGIO: A Single-Cell Expression Simulator Guided by Gene Regulatory Networks. 2020. Cell Systems. Dibaeinia, P., & Sinha, S. [Paper]
-
[2023.08.24 Pre] The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics bioRxiv. 2023. Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Hassan Sirelkhatim, Guillaume Richard, Marcin Skwark, Karim Beguir, Marie Lopez, Thomas Pierrot. [Paper][Slides]
-
[2023.08.31 Pre] HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution axiv. 2023. Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré. [Paper][Slides]
-
Predicting disease-associated mutation of metal-binding sites in proteins using a deep learning approach. Nature Machine Intelligence. Koohi-Moghadam, M., Wang, H., Wang, Y., Yang, X., Li, H., Wang, J., & Sun, H. (2019).
-
Evaluation of deep learning in non-coding RNA classification. Nature Machine Intelligence. Amin, N., McGrath, A., & Chen, Y. P. P. (2019).
-
Feedback GAN for DNA optimizes protein functions. Nature Machine Intelligence. Gupta, A., & Zou, J. (2019).
-
Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders. Nature Machine Intelligence. Lukassen, S., Ten, F. W., Adam, L., Eils, R., & Conrad, C. (2020).
-
Elucidation of DNA methylation on N6-adenine with deep learning. Nature Machine Intelligence. Tan, F., Tian, T., Hou, X., Yu, X., Gu, L., Mafra, F., ... & Hakonarson, H. (2020).
-
Deep learning decodes the principles of differential gene expression. Nature Machine Intelligence.Tasaki, S., Gaiteri, C., Mostafavi, S., & Wang, Y. (2020).
-
Gaussian embedding for large-scale gene set analysis. Nature Machine Intelligence. Wang, S., Flynn, E. R., & Altman, R. B. (2020).
-
Deep learning incorporating biologically inspired neural dynamics and in-memory computing Nature Machine Intelligence. Woźniak, S., Pantazi, A., Bohnstingl, T., & Eleftheriou, E. (2020).
-
A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories. Nature Machine Intelligence. Hong, L., Lin, J., Li, S., Wan, F., Yang, H., Jiang, T., ... & Zeng, J. (2020).
-
A deep learning method for recovering missing signals in transcriptome-wide RNA structure profiles from probing experiments. Nature Machine Intelligence. Gong, J., Xu, K., Ma, Z., Lu, Z. J., & Zhang, Q. C. (2021).
-
Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nature Machine Intelligence. Schulte-Sasse, R., Budach, S., Hnisz, D., & Marsico, A. (2021).
-
An automated framework for efficiently designing deep convolutional neural networks in genomics. Nature Machine Intelligence. Zhang, Z., Park, C. Y., Theesfeld, C. L., & Troyanskaya, O. G. (2021).
-
Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nature Machine Intelligence. Repecka, D., Jauniskis, V., Karpus, L., Rembeza, E., Rokaitis, I., Zrimec, J., ... & Zelezniak, A. (2021).
-
Deep neural networks identify sequence context features predictive of transcription factor binding. Nature Machine Intelligence. Zheng, A., Lamkin, M., Zhao, H., Wu, C., Su, H., & Gymrek, M. (2021).
-
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants. Nature Machine Intelligence. Kassani, P. H., Lu, F., Le Guen, Y., Belloy, M. E., & He, Z. (2022).
-
Molecular convolutional neural networks with DNA regulatory circuits. Nature Machine Intelligence. Xiong, X., Zhu, T., Zhu, Y., Cao, M., Xiao, J., Li, L., ... & Pei, H. (2022).
-
Interpreting neural networks for biological sequences by learning stochastic masks.Nature Machine Intelligence. Linder, J., La Fleur, A., Chen, Z., Ljubetič, A., Baker, D., Kannan, S., & Seelig, G. (2022).
-
Hierarchical deep reinforcement learning reveals a modular mechanism of cell movement. Nature Machine Intelligence. Wang, Z., Xu, Y., Wang, D., Yang, J., & Bao, Z. (2022).
-
Structure-guided isoform identification for the human transcriptome. ELife. Sommer, M. J., Cha, S., Varabyou, A., Rincon, N., Park, S., Minkin, I., Pertea, M., Steinegger, M., & Salzberg, S. L. (2022).
-
DnaBERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome. ArXiv. Zhou, Z., Zhang, Y., Zhang, Y., Zhang, Y., Zhang, Y., Zhang, Y., ... & Zhang, Y. (2022) [Slides].