Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Strand issues for several novel genes in GTF (".", not "+" or "-") #107

Closed
jjuhyunkim opened this issue Aug 22, 2023 · 5 comments
Closed
Labels
fixed in dev Issue resolved but not released yet fixed in release Issue resolved and the fix is released, waiting for approval weird results Something looks odd in the resulting files

Comments

@jjuhyunkim
Copy link

Hi,

I encountered an error stating assert strand == '+' or strand == '-' while running sqanti3_qc.py. Additionally, I noticed several novel genes with a strand designation of "." in the 7th column in transcript_models.gtf (figure below)

These genes are not mono-exonic.
I utilized the most recent version of isoquant (v3.3.1)
Screenshot 2023-08-22 at 3 24 34 PM

I have notice that similar issues in the past, and you addressed them in a previous version.
However, the problem persists.
Could you assist me in identifying the underlying cause of this issue?

Thank you! : )
JH

@andrewprzh andrewprzh added the weird results Something looks odd in the resulting files label Aug 25, 2023
@andrewprzh
Copy link
Collaborator

Dear @Juhyun-Kim-0203

I suspect that these transcripts may not have any canonical splice sites, and thus their strand cannot be determined.
Could you tell which genome you are using? Also, I would be helpful if you could share a small subset of reads that map these transcripts. Is it possbile?

Probably, I should also make an option for filtering out such "unreliable" transcript predictions.

Best
Andrey

@andrewprzh
Copy link
Collaborator

Dear @Juhyun-Kim-0203

Found a silly bug, it happens when lower-case characters appear in the genome (masked regions). Fixed now in master and will be fixed in the next release, which I plan to do ASAP.

Thanks for the report!

Best
Andrey

@andrewprzh andrewprzh added the fixed in dev Issue resolved but not released yet label Aug 31, 2023
@jjuhyunkim
Copy link
Author

jjuhyunkim commented Aug 31, 2023 via email

@andrewprzh
Copy link
Collaborator

Dear @Juhyun-Kim-0203

For hard-masked genomes I presume minimap won't be able to map the reads and thus the transcript won't be discovered at all. However, if splice sites have "NN" characters the strand will be also reported as ".".

Best
Andrey

@andrewprzh
Copy link
Collaborator

Finally released new version 3.4, which fixes this issue.

@andrewprzh andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label May 9, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
fixed in dev Issue resolved but not released yet fixed in release Issue resolved and the fix is released, waiting for approval weird results Something looks odd in the resulting files
Projects
None yet
Development

No branches or pull requests

2 participants