Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reference loci missing from extended annotation #147

Closed
apollo994 opened this issue Feb 5, 2024 · 2 comments
Closed

Reference loci missing from extended annotation #147

apollo994 opened this issue Feb 5, 2024 · 2 comments
Labels
bug Something isn't working fixed in dev Issue resolved but not released yet fixed in release Issue resolved and the fix is released, waiting for approval

Comments

@apollo994
Copy link

Hi,

I'm using IsoQuant to quantify the amount of novel isoforms detected by two RNA seq experiment starting from a reference annotation.

This is my experimental set up:

  • Reference genome of a butterfly
  • Reference annotation automatically generated with Helixer (contains only one isoform per gene)
  • RNA seq with ONT of tissue 1
  • RNA seq with ONT of tissue 2

Here are the question I wanna answer:

  • how many novel genes
  • how many novel isoforms
  • overlap between novel isoforms in the two tissue.

IsoQuant was my top choice as it allows to run with/without reference annotation, it provides the SQANTI output and it was relatively easy to set up.

Here is an example of my IsoQuant command for one of the tissues:

software/IsoQuant-3.3.1/isoquant.py \
	--reference reference.fa \
    --genedb helixer_ann.gff3 \
    --fastq RNAseq1.fastq \
    --data_type nanopore \
    -o out_folder/  \
    --sqanti_output --threads 16

This resulted in two *.extended_annotation.gtf that I checked with gff compare. Here is the result:

#= Summary for dataset: helixer_ann.gff3 
#     Query mRNAs :   16102 in   16102 loci  (14527 multi-exon transcripts)
#            (0 multi-transcript loci, ~1.0 transcripts per locus)

#= Summary for dataset: RNAseq1.extended_annotation.gtf 
#     Query mRNAs :   20115 in   14893 loci  (18740 multi-exon transcripts)
#            (3155 multi-transcript loci, ~1.4 transcripts per locus)

#= Summary for dataset: RNSseq2.extended_annotation.gtf 
#     Query mRNAs :   18330 in   13136 loci  (17166 multi-exon transcripts)
#            (3068 multi-transcript loci, ~1.4 transcripts per locus)

How is possible that the total number of loci decreases?
In principle the extended annotation should contains all reference + novel.

I tought that one reason could be due to close genes that get "fused" in one locus by adding the RNA seq knowledge.
This is not the case, as by inspecting the gtf on IGV I could find many cases where the reference locus was completely missing in the extended annotation, or the annotated isoform was shorter.
Screenshot 2024-02-05 at 19 35 30

This is also confirmed by extracting genes in the same interval from the three gtf:

for a in anns:
    print (a)
    print(anns[a]['ilHelHell1.1_Chr7',5452963:5485951].gene_id.unique())

helix
['_ilHelHell1.1_Chr7_000057' '_ilHelHell1.1_Chr7_000058'
 '_ilHelHell1.1_Chr7_000059' '_ilHelHell1.1_Chr7_000060'
 '_ilHelHell1.1_Chr7_000061']
RNAseq1
['_ilHelHell1.1_Chr7_000059']
RNAseq2
['_ilHelHell1.1_Chr7_000058']

Do you have any explanation for this?
Please let me know if you need any further information to troubleshoot this.

In the meantime, many thanks for sharing your project!

F

@andrewprzh
Copy link
Collaborator

Dear @apollo994

This is known bug in generation of extended annotation, it will be fixed in the nearest release!

Best
Andrey

@andrewprzh andrewprzh added the bug Something isn't working label Feb 6, 2024
@andrewprzh andrewprzh added the fixed in dev Issue resolved but not released yet label Feb 20, 2024
@andrewprzh andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label May 9, 2024
@andrewprzh
Copy link
Collaborator

Finally released new version 3.4, which fixes this issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working fixed in dev Issue resolved but not released yet fixed in release Issue resolved and the fix is released, waiting for approval
Projects
None yet
Development

No branches or pull requests

2 participants