You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @timoast I created a new issue with the simplified code I ran up until the error I get with `LinkPeaks()'. I followed Joint RNA and ATAC analysis: 10x multiomic tutorial to process output from the 10X Genomics Cellranger ARC multiomics pipeline. I'm unable to link peaks to genes in order to create coverage plots. Everything works up until that point. Any insight on how to fix this would be appreciated, thanks.
STEP 1 read in count matrices & atac fragments path for sample 1MI5
# create a Seurat object containing the RNA adata
gc_liftoff_1MI5_SO <- CreateSeuratObject(
counts = gc_liftoff_1MI5_matrix$`Gene Expression`,
assay = "RNA"
)
# create ATAC assay and add it to the object
gc_liftoff_1MI5_SO[["ATAC"]] <- CreateChromatinAssay(
counts = gc_liftoff_1MI5_matrix$Peaks,
sep = c(":", "-"),
fragments = gc_liftoff_1MI5_fragpath
)
STEP 3 add macfas6 gencodev47 liftoff gtf to seurat object so TSS.enrichment score can be calculated
Mapped multiomics reads (fastq) with 10x cellranger arc using the following fasta & gtf
input_fasta: ["mac_ref/GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna"]
input_gtf: ["mac_ref/macFas6-hg38-gencode.v47.basic.sorted.gtf"]. Gtf made with liftoff software. More details here.
IMPORTANT: chromosomes in liftoff gtf aren't listed as 'chr1', 'chr2', 'chr3' in UCSC style or '1', '2', '3' in NCBI style. They are listed as genbank seq accession numbers 'CM021939.1' for chrom 1, 'CM021940.1' for chrom 2, 'CM021941.1' for chrom 3, etc (has chromosomes 1-20, chromosome X, then the rest are 'Un' chromosomes)
# add liftoff annotation to object
# Step 3a: Load the GTF file
gencode_gtf_path <- '~/macaque_multiomics/mac_ref/macFas6-hg38-gencode.v47.basic.sorted.gtf'
gencode_gtf <- rtracklayer::import(gencode_gtf_path)
# Step 3b: Map gene_type to gene_biotype
# Check if gene_type exists, and then map it to gene_biotype
if ("gene_type" %in% colnames(mcols(gencode_gtf))) {
mcols(gencode_gtf)$gene_biotype <- NA
mcols(gencode_gtf)$gene_biotype[mcols(gencode_gtf)$gene_type == "protein_coding"] <- "protein_coding"
} else {
stop("The GTF file does not contain a gene_type field.")
}
# did this because TSSenrichment function requires gene_biotype info. In liftoff gtf attribute listed as gene_type
# i.e gene_type "protein_coding", gene_type "lncRNA", etc
# Step 3c: Filter out entries with missing or invalid data (filtered out all NA's that were in the liftoff gtf GRanges object before calculating TSS score)
gencode_gtf <- gencode_gtf[
!is.na(mcols(gencode_gtf)$gene_biotype) &
!is.na(seqnames(gencode_gtf)) &
!is.na(start(gencode_gtf)) &
!is.na(end(gencode_gtf)) &
!is.na(strand(gencode_gtf))
]
# Step 3d: Make default assay ATAC
DefaultAssay(gc_liftoff_1MI5_SO) <- "ATAC"
# Step 3e: Assign the gencode_gtf to the Seurat object
Annotation(gc_liftoff_1MI5_SO) <- gencode_gtf
# Step 3f: Run TSSEnrichment() function
gc_liftoff_1MI5_SO <- TSSEnrichment(gc_liftoff_1MI5_SO)
STEP 11 compute GC content for each peak using fasta instead of Bsgenome object
DefaultAssay(gc_liftoff_1MI5_SO) <- "ATAC"
macfas6_fasta_path <- "~/macaque_multiomics/mac_ref/GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna"
macfas6_fasta_file <- FaFile(macfas6_fasta_path) # Create FaFile object
# compute the GC content for each peak (works, no warnings or errors)
gc_liftoff_1MI5_SO <- RegionStats(gc_liftoff_1MI5_SO, genome = macfas6_fasta_file)
STEP 12 Link peaks to genes (where I'm getting error)
# link peaks to genes
gc_liftoff_1MI5_SO <- LinkPeaks(
object = gc_liftoff_1MI5_SO,
peak.assay = "ATAC",
expression.assay = "SCT",
genes.use = c("LYZ", "MS4A1")
)
Testing 2 genes and 206443 peaks
| | 0 % ~calculating Warning: Requested more features than present in supplied data.
Returning 0 featuresError in density.default(x = query.feature[[featmatch]], kernel = "gaussian", :
argument 'x' must be numeric
Hi @timoast I created a new issue with the simplified code I ran up until the error I get with `LinkPeaks()'. I followed Joint RNA and ATAC analysis: 10x multiomic tutorial to process output from the 10X Genomics Cellranger ARC multiomics pipeline. I'm unable to link peaks to genes in order to create coverage plots. Everything works up until that point. Any insight on how to fix this would be appreciated, thanks.
STEP 1 read in count matrices & atac fragments path for sample 1MI5
STEP 2 create Seurat object
STEP 3 add macfas6 gencodev47 liftoff gtf to seurat object so TSS.enrichment score can be calculated
Mapped multiomics reads (fastq) with 10x cellranger arc using the following fasta & gtf
input_fasta: ["mac_ref/GCA_011100615.1_Macaca_fascicularis_6.0_genomic.fna"]
input_gtf: ["mac_ref/macFas6-hg38-gencode.v47.basic.sorted.gtf"]. Gtf made with liftoff software. More details here.
IMPORTANT: chromosomes in liftoff gtf aren't listed as 'chr1', 'chr2', 'chr3' in UCSC style or '1', '2', '3' in NCBI style. They are listed as genbank seq accession numbers 'CM021939.1' for chrom 1, 'CM021940.1' for chrom 2, 'CM021941.1' for chrom 3, etc (has chromosomes 1-20, chromosome X, then the rest are 'Un' chromosomes)
STEP 4 calculate nucleosome signal
STEP 5 Make QC violin plots
STEP 6 make density scatter plot
STEP 7 subset/filter based on tutorial cutoffs
STEP 8 normalize the gene expression data using SCTransform, and reduce the dimensionality using PCA
STEP 9 process the DNA accessibility assay the same way we would process a scATAC-seq dataset, by performing latent semantic indexing (LSI)
STEP 10 make snRNAseq, snATACseq, & snRNAseq + ATACseq UMAPs
STEP 11 compute GC content for each peak using fasta instead of Bsgenome object
STEP 12 Link peaks to genes (where I'm getting error)
R session info
The text was updated successfully, but these errors were encountered: