Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Novel gene with overlapping coordinates by using reference annotation #164

Closed
zpliu1126 opened this issue Mar 12, 2024 · 5 comments
Closed
Labels
enhancement New feature or request

Comments

@zpliu1126
Copy link

Hi~ Andrey,

After updating the GFF annotation with Isoquant, I found many Novelgenes with overlapping coordinates when I used Stringtie to calculate the gene expression downstream. I guess these may be different transcripts of the same gene, and they are not well combined.

#* example of novel gene
cat  A2.extended_annotation.gtf|grep 108761523|awk '$3=="gene"{print $0}'

###########restul##############
Chr12	IsoQuant	gene	108761523	108762488	.	+	.	gene_id "novel_gene_Chr12_177202"; transcripts "1"; 
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177104"; transcripts "1"; 
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177106"; transcripts "1"; 
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177152"; transcripts "1"; 
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177154"; transcripts "1";

Best
zpliu

@andrewprzh
Copy link
Collaborator

Dear @zpliu1126

Current gene detection algorithm is based purely on spliced sites shared between transcripts. Probably, using exonic overlaps also makes sense.
Could you send all GTF lines for these genes?

Best
Andrey

@andrewprzh andrewprzh added the enhancement New feature or request label Mar 12, 2024
@zpliu1126
Copy link
Author

Hi~ Andrey,
Here are a few examples.

Chr12	IsoQuant	gene	108761523	108762488	.	+	.	gene_id "novel_gene_Chr12_177202"; transcripts "1"; 
Chr12	IsoQuant	transcript	108761523	108762488	.	+	.	gene_id "novel_gene_Chr12_177202"; transcript_id "transcript177201.Chr12.nnic"; exons "2";
Chr12	IsoQuant	exon	108761523	108761897	.	+	.	gene_id "novel_gene_Chr12_177202"; transcript_id "transcript177201.Chr12.nnic"; exon "1"; exon_id "Chr12.19656";
Chr12	IsoQuant	exon	108762093	108762488	.	+	.	gene_id "novel_gene_Chr12_177202"; transcript_id "transcript177201.Chr12.nnic"; exon "2"; exon_id "Chr12.19657";
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177104"; transcripts "1"; 
Chr12	IsoQuant	transcript	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177104"; transcript_id "transcript177103.Chr12.nnic"; exons "4";
Chr12	IsoQuant	exon	108761523	108762836	.	+	.	gene_id "novel_gene_Chr12_177104"; transcript_id "transcript177103.Chr12.nnic"; exon "1"; exon_id "Chr12.19658";
Chr12	IsoQuant	exon	108762965	108763841	.	+	.	gene_id "novel_gene_Chr12_177104"; transcript_id "transcript177103.Chr12.nnic"; exon "2"; exon_id "Chr12.19659";
Chr12	IsoQuant	exon	108764126	108764230	.	+	.	gene_id "novel_gene_Chr12_177104"; transcript_id "transcript177103.Chr12.nnic"; exon "3"; exon_id "Chr12.19660";
Chr12	IsoQuant	exon	108764509	108765123	.	+	.	gene_id "novel_gene_Chr12_177104"; transcript_id "transcript177103.Chr12.nnic"; exon "4"; exon_id "Chr12.19661";
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177106"; transcripts "1"; 
Chr12	IsoQuant	transcript	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177106"; transcript_id "transcript177105.Chr12.nnic"; exons "4";
Chr12	IsoQuant	exon	108761523	108762836	.	+	.	gene_id "novel_gene_Chr12_177106"; transcript_id "transcript177105.Chr12.nnic"; exon "1"; exon_id "Chr12.19658";
Chr12	IsoQuant	exon	108762965	108763841	.	+	.	gene_id "novel_gene_Chr12_177106"; transcript_id "transcript177105.Chr12.nnic"; exon "2"; exon_id "Chr12.19659";
Chr12	IsoQuant	exon	108764126	108764230	.	+	.	gene_id "novel_gene_Chr12_177106"; transcript_id "transcript177105.Chr12.nnic"; exon "3"; exon_id "Chr12.19660";
Chr12	IsoQuant	exon	108764864	108765123	.	+	.	gene_id "novel_gene_Chr12_177106"; transcript_id "transcript177105.Chr12.nnic"; exon "4"; exon_id "Chr12.19662";
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177152"; transcripts "1"; 
Chr12	IsoQuant	transcript	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177152"; transcript_id "transcript177151.Chr12.nnic"; exons "3";
Chr12	IsoQuant	exon	108761523	108762836	.	+	.	gene_id "novel_gene_Chr12_177152"; transcript_id "transcript177151.Chr12.nnic"; exon "1"; exon_id "Chr12.19658";
Chr12	IsoQuant	exon	108762965	108764230	.	+	.	gene_id "novel_gene_Chr12_177152"; transcript_id "transcript177151.Chr12.nnic"; exon "2"; exon_id "Chr12.19663";
Chr12	IsoQuant	exon	108764509	108765123	.	+	.	gene_id "novel_gene_Chr12_177152"; transcript_id "transcript177151.Chr12.nnic"; exon "3"; exon_id "Chr12.19661";
Chr12	IsoQuant	gene	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177154"; transcripts "1"; 
Chr12	IsoQuant	transcript	108761523	108765123	.	+	.	gene_id "novel_gene_Chr12_177154"; transcript_id "transcript177153.Chr12.nnic"; exons "3";
Chr12	IsoQuant	exon	108761523	108762836	.	+	.	gene_id "novel_gene_Chr12_177154"; transcript_id "transcript177153.Chr12.nnic"; exon "1"; exon_id "Chr12.19658";
Chr12	IsoQuant	exon	108762965	108764230	.	+	.	gene_id "novel_gene_Chr12_177154"; transcript_id "transcript177153.Chr12.nnic"; exon "2"; exon_id "Chr12.19663";
Chr12	IsoQuant	exon	108764864	108765123	.	+	.	gene_id "novel_gene_Chr12_177154"; transcript_id "transcript177153.Chr12.nnic"; exon "3"; exon_id "Chr12.19662";
Chr12	IsoQuant	exon	108761668	108761897	.	+	.	gene_id "novel_gene_Chr12_177166"; transcript_id "transcript177165.Chr12.nnic"; exon "1"; exon_id "Chr12.19664";
Chr12	IsoQuant	exon	108762084	108762488	.	+	.	gene_id "novel_gene_Chr12_177166"; transcript_id "transcript177165.Chr12.nnic"; exon "2"; exon_id "Chr12.19665";
Chr12	IsoQuant	gene	108761721	108765123	.	+	.	gene_id "novel_gene_Chr12_177188"; transcripts "1"; 
Chr12	IsoQuant	transcript	108761721	108765123	.	+	.	gene_id "novel_gene_Chr12_177188"; transcript_id "transcript177187.Chr12.nnic"; exons "3";
Chr12	IsoQuant	exon	108761721	108762618	.	+	.	gene_id "novel_gene_Chr12_177188"; transcript_id "transcript177187.Chr12.nnic"; exon "1"; exon_id "Chr12.19667";
Chr12	IsoQuant	exon	108762699	108762836	.	+	.	gene_id "novel_gene_Chr12_177188"; transcript_id "transcript177187.Chr12.nnic"; exon "2"; exon_id "Chr12.19668";
Chr12	IsoQuant	exon	108762965	108765123	.	+	.	gene_id "novel_gene_Chr12_177188"; transcript_id "transcript177187.Chr12.nnic"; exon "3"; exon_id "Chr12.19669";

Best
zpliu

@zpliu1126
Copy link
Author

Dear @zpliu1126

Current gene detection algorithm is based purely on spliced sites shared between transcripts. Probably, using exonic overlaps also makes sense. Could you send all GTF lines for these genes?

Best Andrey

I haven used gffread to merge those transcripts.

 gffread  -M  --cluster-only isoquant.gff >isoquant_collapse.gff

Best
zpliu

@andrewprzh
Copy link
Collaborator

@zpliu1126

Thanks for the data, at looks that at least some transcripts share splice sites. I'll try to improve the algorithm for detecting same-gene transcripts.

Best
Andrey

@andrewprzh
Copy link
Collaborator

Novel gene merging strategy is completely reworked in IsoQuant 3.4, should be fixed now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants