Batch correction and downstream analysis of prokaryotic scRNA datasets published by Kuchina et al. (2021) and Blattman et al. (2020). Report of results are accessible at:
Preprocessing and replication of Kuchina et al. (2021) results on E.coli data, batch correction using MNN Correct and scVI, clustering visualizations, differential gene expression analysis, manual exploration of clusters and marker genes, and Wishbone trajectory inference of imputed count matrix.
Preprocessing and replication of Kuchina et al. (2021) results on B.sub data, batch correction using comBat, Harmony and MNN Correct, and visualizations. Blattman_Ecoli_ Data_Preprocessing.ipynb: Blattman et al. (2021) E. coli dataset preprocessing, principal component analysis, and 2D embeddings to confirm original results and validate presence of exponential and stationary growth stages.
Blattman et al. (2021) E. coli dataset preprocessing, principal component analysis, and 2D embeddings to confirm original results and validate presence of exponential and stationary growth stages.
Trajectory inference using Wishbone and differentially expressed gene analysis for the Blattman E. coli dataset following preprocessing and clustering. Scanorama_Batch_Correction.ipynb: Batch integration of Blattman and Kuchina datasets using Scanorama including dataset preparation, integration, DEG analysis, trajectory inference, clustering, and 2D embeddings.
Batch integration of Blattman and Kuchina datasets using Scanorama including dataset preparation, integration, DEG analysis, trajectory inference, clustering, and 2D embeddings.
Run using Jupyter Notebook or Google Colab (preferred). Each notebook is independent and does not need to be run in any specific order. For scVI, GPU usage is recommended. All datasets must be uncompressed and kept within original paths for appropriate reference.
Package Version
---------------- ---------------------
anndata 0.8.0
debugpy 1.0.0
h5py 3.1.0
igraph 0.9.10
imageio 2.4.1
matplotlib 3.5.1
mnnpy 0.1.9.5
MulticoreTSNE 0.1
numba 0.51.2
numpy 1.21.6
oauthlib 3.2.0
palantir 1.0.0
pandas 1.3.5
PhenoGraph 1.5.7
scanorama 1.7.2
scanpy 1.8.2
scipy 1.7.3
scprep 1.1.0
scvi-colab 0.10.0
scvi-tools 0.16.1
seaborn 0.11.2
setuptools 59.5.0
sklearn 0.0
sklearn-pandas 1.8.0
tensorflow 2.8.0
umap-learn 0.5.3
wishbone-dev 0.5.2
Blattman SB, Jiang W, Oikonomou P, Tavazoie S. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat Microbiol 2020 Oct;5(10):1192-1201. Gene Expression Omnibus (GEO) under accession number GSE141018.
Kuchina A, Brettner LM, Paleologu L, Roco CM, Rosenberg AB, Carignano A, Kibler R, Hirano M, DePaolo RW, Seelig G. Microbial single-cell RNA sequencing by split-pool barcoding. Science. 2021 Feb 19;371(6531) Gene Expression Omnibus (GEO) under accession number GSE151940.