This is the Jiang lab scATAC-seq processing and cell clustering pipeline.
Urrutia, Eugene, et al. "Destin: toolkit for single-cell analysis of chromatin accessibility." Bioinformatics, btz141, 2019. link
If you have any questions or problems when using destin, please feel free to open a new issue here. You can also email the maintainers of the corresponding packages -- the contact information is below.
The bioinformatic pipeline requires cloning the git repostory from github, where yourPathToDestinRepo is your path to the local cloned repository
Running the vignettes also requires cloning the git repository
cd yourPathToDestinRepo
git clone https://github.com/urrutiag/destin.git
install dependencies
installed <- rownames(installed.packages())
pkgs = c("cluster", "data.table", "ggplot2",
"gridExtra", "irlba", "Matrix",
"parallel", "Rtsne")
pkgs <- setdiff(pkgs, installed)
if (length(pkgs))
install.packages(pkgs, dep=c("Depends", "Imports"))
biocPkgs = c("ChIPpeakAnno", "GenomicAlignments", "rtracklayer")
biocPkgs <- setdiff(biocPkgs, installed)
if (length(biocPkgs)) {
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(biocPkgs)
}
#ClusterR is an optional package:
if ( ! "ClusterR" %in% rownames(installed.packages() ) )
install.packages("ClusterR", dep=c("Depends", "Imports"))
Running the R package requires either installing from the above git repostory locally
install.packages("yourPathToDestinRepo/package", repos = NULL, type = "source")
library(destin)
or downloading from github directly (note that this will not allow for the bioinformatics pipeline or the vignettes):
install.packages("devtools")
devtools::install_github("urrutiag/destin/package")
library(destin)
-
software: SRAtoolkit, cutadapt, BOWTIE2, samtools, picard, MACS2, bedtools, awk, R, python
-
R packages:
ChIPpeakAnno, cluster, data.table, GenomicAlignments, ggplot2, gridExtra, irlba, Matrix, parallel, rtracklayer, Rtsne -
Optional R packages: ClusterR
Input: fastq files of entire experiment or individual fastq files by cell, set of 2 for each of paired reads
Output: bam files by cell, peaks file
- download fastq
- separate fastq by cell (if combinatorial indexed)
- cut adapters
- align
- sam to bam
- sort
- Add read group and index
- mark duplicates
- remove mitochondrial, unmapped and chr Y
- adjust for Tn5 insertion
- alignment quality >= 30
- index
- call peaks (p < 0.01)
- filter blacklist
Input: bam files by cell, peaks file
Output: cluster membership, differential accessibility
- create ranged summarized experiment from bam files and peaks file
- append experimental information if available
- annotate regions
- quality control on cells and regions
- determine number of clusters
- cluster cells by destin which optimizes hyperparameters via multinomial likelihood
- calculate differential accessibility
Determine whether GWAS results are associated with increased chromatin accessibility in a particular cell type cluster. We utilize 2 methods originally developed for scRNA-seq expression: ECWE and MAGMA.
-
Bioinformatics and Clustering: Buenrostro mouse cells, Fluidigm microfluidic technology html markdown
-
Clustering: Preissl P56 forebrain mouse cells, combinatorial barcode technology html markdown
-
GWAS cell-type specific association: Preissl P56 forebrain mouse cells html markdown
-
Read 10x genomics scATAC-seq PBMC data and cluster html markdown
de Leeuw, C. A., et al. (2015). Magma: generalized gene-set analysis of gwas data. PLoS comput. biol., 11 (4), e1004219.
Skene, N. G. et al. (2016). Identification of vulnerable cell types in major brain disorders using single cell transcriptomes and expression weighted cell type enrichment. Front. neurosci-switz,10, 16.
-
Gene Urrutia (gene dot urrutia at gmail dot com)
Hill-Rom Innovation, Cary, NC -
Yuchao Jiang (yuchaoj at email dot unc dot edu)
Department of Biostatistics & Department of Genetics, UNC-Chapel Hill