-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Error in reading BAM/SAM file. truncated file #34
Comments
Hi. The error message suggest that there is something unexpected in the input SAM file (formatting, odd characters, or something else), as a result of which it cannot be read. A quick check would be to try to convert it to bam. If that also gives this error, then you can be sure that the SAM file is incorrect. If it works, then you can try to use that BAM file as input for SyRI. |
Hi, Actually, due to the length of the chromosome (>400Mb), it cannot be convert to the bam, the max CIGAR operator length is 2**28=268435456.(see here lh3/minimap2#440). |
Yes, SyRI uses pysam to read the SAM file. I would have to check how pysam is handling this. The discussion here pysam-developers/pysam#613 suggests that they fixed this, but I would have to check how is this working exactly. |
Thanks, I will try to use nucmer. But my species have more than 60% repetitive sequence, do I need do repeatmask first? hardmask or softmask? |
Repeats are not an issue algorithmically, just that they can increase run-time and memory use significantly. So, you can decide about masking based on your project's requirements and time/memory restrictions. |
Thanks for your promptly response. In fact, I try to split chromosome (ignore intra-chromosome rearrangement) to speed up, but SyRI was strunk in one chromosome for 7 days, then crashed. I will compare the w/o mask result. |
Could you please start a new issue for this crash and, if possible, share the syri.log file? I would like to see what happened there. Also, if you still have the error message then please share that too. |
I created a new issue for this. |
Hi, Delta file from |
Hi Zhigui, Thanks for letting me know. If I understand correctly, minimap2 can align such large chromosomes and can generate the SAM output. However, the alignments can neither be transformed to a BAM output nor can they be read though pysam. If this is the case, then it can be solved by a custom function to read SAM files which would be a better solution than not being able to use minimap2 for larger chromosomes. Regarding the comparison between mummer and minimap2, the former is more sensitive and finds more alignments (at least in my experience) but at the cost of adding extra noisy alignments which can result in noisy annotations by SyRI. Minimap2 on the other hands results in more cleaner alignments (and cleaner SyRI annotations), but some alignments could be missing. I have not compared the differences at the basepair level though. |
Hi Goel, Yes. You are right. |
minimap2 does not do basepair alignment without the -a option (as PAF does not output that). So, using -a do increase runtime for it. The file size becomes large because in SAM/BAM each line contains query sequence, but the alignment information stays the same. |
I have added a reader for SAM files and now genome size should not be an issue. |
Hi,
I am using SyRI to idenitify SVs between two species (same genus, 10 Mya divergened). Here is the full command I use, but the error encounted
[W::sam_read1] Parse error at line 20 Reading BAM/SAM file - ERROR - Error in reading BAM/SAM file. truncated file
minimap2 -t 24 -ax asm10 --eqx ref.fa que.fa > out.sam python3 /data/software/SyRI/syri/syri/bin/syri -c out.sam -r ref.fa -q que.fa -k -F S --nc 12
out.sam
The text was updated successfully, but these errors were encountered: