Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parse error with PacBio long reads #311

Closed
olechnwin opened this issue Jan 14, 2019 · 13 comments
Closed

Parse error with PacBio long reads #311

olechnwin opened this issue Jan 14, 2019 · 13 comments

Comments

@olechnwin
Copy link

Hello,
I am using minimap2 for mapping PacBio long reads.
Here is the command I ran:

minimap2 -ax map-pb cns_p_ctg.fasta 4SQ1235_reads_fasta.tar.gz > 4SQ1235_reads_fasta.sam

minimap ran to completion and produced a sam.
However, that sam file generated the following error when I ran samtools:

Line 2219103, sequence length 0 vs 18197 from CIGAR
Parse error at line 2219103: CIGAR and sequence length are inconsistent

The samtools command I ran:

samtools view -hSF 256 4SQ1235_reads_fasta.sam > 4SQ1235_reads_fasta_priAlign.sam

Versions:

minimap2 --version
2.15-r905

samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

At first this error appear to be the same as issue #231. However, using the latest minimap seems to fix issue #231 but it didn't fix this error.

The offending line is attached:
error_sam.txt

Thank you!

@lh3 lh3 added bug and removed bug labels Jan 14, 2019
@lh3
Copy link
Owner

lh3 commented Jan 14, 2019

Is this the last line in your file?

@olechnwin
Copy link
Author

Is this the last line in your file?

Sorry. Can you please clarify your question?

@lh3
Copy link
Owner

lh3 commented Jan 14, 2019

I mean, is the offending line the last line in your file? It seems that the line you gave me is not complete.

@olechnwin
Copy link
Author

I mean, is the offending line the last line in your file? It seems that the line you gave me is not complete.

Ahh...I see what you meant.
That is the entire line.
I am attaching the offending line and the line after that (which is also the last line in sam).
error_and_next_sam.txt

This is generated using:
sed -n 2219103,2219104p 4SQ1235_reads_fasta.sam > error_and_next_sam.txt

Thank you so much for your quick response.

@lh3 lh3 added the bug label Jan 14, 2019
@lh3
Copy link
Owner

lh3 commented Jan 14, 2019

Could you send me the read and the reference genome? This is probably a bug. Thanks.

@olechnwin
Copy link
Author

I uploaded the reads: 4SQ1235_reads_fasta.tar.gz and the reference: cns_p_ctg.fasta here

FYI, the error is not specific to this read as I tried different PacBio read file and it still generated the same error.

Thanks.

@lh3
Copy link
Owner

lh3 commented Jan 14, 2019

Thanks a lot. I have downloaded the data. Will have a look today or in the next couple of days.

lh3 added a commit that referenced this issue Jan 23, 2019
@lh3
Copy link
Owner

lh3 commented Jan 23, 2019

I failed to reproduce the issue. In my SAM, I see a CIGAR 2906S9M2D46M for that particular line. Yours is 13878S9M2D46M. The read is only 7225bp in length, so the 13878S clipping is wrong. It is not clear to me how this may happen when I looked at the source code.

I still believe this is a bug, but I need the input from others to nail it down. For the moment, you may filter out that offending line.

@armintoepfer have you seen a similar issue?

@lh3 lh3 added feature-request and removed bug labels Jan 23, 2019
@lh3
Copy link
Owner

lh3 commented Jan 23, 2019

@olechnwin did you run the following command line as it is?

minimap2 -ax map-pb cns_p_ctg.fasta 4SQ1235_reads_fasta.tar.gz > 4SQ1235_reads_fasta.sam

Note that your input reads are not gzip'd. It is .tar.gz. While .tar is close to the original file, it contains extra data. The particular line you see may have invisible characters. Please run the following instead:

tar -zvxf 4SQ1235_reads_fasta.tar.gz
minimap2 -ax map-pb cns_p_ctg.fasta reads.fasta | gzip -1 > output.sam.gz

I will consider to improve error checking at some point.

@armintoepfer problem solved. Sorry for the false alarm.

@lh3 lh3 closed this as completed Jan 23, 2019
@olechnwin
Copy link
Author

@lh3 Thanks for looking into this.
I am currently running it after extracting the files. I'll let you know how it goes.

@lh3
Copy link
Owner

lh3 commented Jan 24, 2019

Thanks, @olechnwin. Let me know if you see similar issues.

@olechnwin
Copy link
Author

@lh3 sorry for the delay. I just want to let you know that everything works after extracting the filles. Thank you so much for your help.

@lh3
Copy link
Owner

lh3 commented Feb 6, 2019

Thanks for the confirmation.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants