-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fetch feature lengths info using biomaRt #2
Comments
Hi there, Thanks for your question, library("countToFPKM")
library("biomaRt")
library("dplyr")
## Import feature counts matrix
file.readcounts <- system.file("extdata", "RNA-seq.read.counts.csv", package="countToFPKM")
counts <- as.matrix(read.csv(file.readcounts))
## Build a biomart query
# In the example below, I use the human gene annotation from Ensembl release 82 located on "sep2015.archive.ensembl.org"
# More about the ensembl_build can be found on "http://www.ensembl.org/info/website/archives/index.html"
ens_build = "sep2015"
dataset="hsapiens_gene_ensembl"
mart = biomaRt::useMart("ENSEMBL_MART_ENSEMBL", dataset = dataset,
host = paste0(ens_build, ".archive.ensembl.org"), path = "/biomart/martservice", archive = FALSE)
gene.annotations <- biomaRt::getBM(mart = mart, attributes=c("ensembl_gene_id", "external_gene_name", "start_position", "end_position"))
gene.annotations <- dplyr::transmute(gene.annotations, external_gene_name, ensembl_gene_id, length = end_position - start_position)
# Filter and re-order gene.annotations to match the order in feature counts matrix
gene.annotations <- gene.annotations %>% dplyr::filter(ensembl_gene_id %in% rownames(counts))
gene.annotations <- gene.annotations[order(match(gene.annotations$ensembl_gene_id, rownames(counts))),]
# Assign feature lenghts into a numeric vector.
featureLength <- gene.annotations$length A good tutoiral about using biomart can be found on DAVE TANG'S BLOG Hope it helps! |
Hi, thank you for this step-by step process! My featurecounts output has UCSC/Refseq gene ids? Is there a way to convert the code to incorporate that? Thank you! |
Hi excel9 You can use UCSC table browser to download list of known genes with their annotations (gene_id, gene_symbol, start, end), as tab-delimited file. Then you can try: yourfeaturecounts <- as.matrix(read.csv("yourfeatures.counts.csv"))
ucsc.knowngenes <- read.table(file="ucsc.known.genes.txt", header=T, sep="\t")
ucsc.knowngenes <- dplyr::transmute(ucsc.knowngenes, gene_id, gene_symbol, length = end - start)
ucsc.knowngenes <- ucsc.knowngenes %>% dplyr::filter(gene_id %in% rownames(counts))
ucsc.knowngenes <- ucsc.knowngenes[order(match(ucsc.knowngenes$gene_id, rownames(counts))),]
featureLength <- ucsc.knowngenes$length |
Hi Ahmed, Whenever I am executing this it says 0 observations of 5 variables were created. Thank you! |
|
Dear shitiezhu, |
Dear AAlhendi1707, All fragments in the library do not include any introns. I think shitezhu is correct. Are we wrong? |
Dear Nairobi, please check this link |
I think @shitiezhu is correct for count matrices that are only aligned to exons (i.e. kallisto doesn't pseudoalign to introns): https://www.biostars.org/p/83901/ |
I'd like to use your package to calculate FPKM values from my ht-seq counts but am having trouble getting started.
I would like to prepare the
featureLength
vector using BiomaRt but don't see the documentation to do this anywhere on the BiomaRt manual. Is this metric called something else? Can you link me to the description of how to prepare this?The text was updated successfully, but these errors were encountered: