Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parsing SAM to BAM from genome alignment #440

Closed
Murukarthick opened this issue Jul 2, 2019 · 5 comments
Closed

Parsing SAM to BAM from genome alignment #440

Murukarthick opened this issue Jul 2, 2019 · 5 comments
Labels

Comments

@Murukarthick
Copy link

Hi,

I aligned two chromosome sequences using minimap2 (version 2.17-r941) as below,

$minimap --MD -L -2 -ax asm5 -f 0.005 -t $threads -I $size -K$mem $ref $qry > chr2.sam

It produced the SAM output. However, this file was unable to convert to bam. I got below error,

$samtools view -@ $threads -bS chr2H.sam
| $samtools sort -@ $threads - -o chr2H_srt.bam

[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 3
[main_samview] truncated file.

Although this bug was similar to #231 it was not resolved with asm5 option. Could you please recommend something to fix this issue ?

@lh3
Copy link
Owner

lh3 commented Jul 2, 2019

#231 has long been fixed. Are you using nohup or LSF/Slurm/etc? If unsure, use minimap2 -o to specify the output file.

PS: samtools is complaining about line 3. What does that line look like?

@Murukarthick
Copy link
Author

Hi
I tried with -o option but still I am getting the same error.
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 3
[main_samview] truncated file.

You can see the line 3 using following link https://transfer.ipk-gatersleben.de/upload2/6uOsyohX/
Thanks !

@lh3 lh3 added the question label Jul 3, 2019
@lh3
Copy link
Owner

lh3 commented Jul 3, 2019

You are hitting the limit of BAM format. In BAM, the max CIGAR operator length is 2**28=268435456. The soft clippings in your SAM are longer.

@Murukarthick
Copy link
Author

Hi
I split a single chromosome into two parts in both reference (part 1(size)=341635707, part2=333674587) and query (part 1=330830028, part 2=321721244) sequence. I did alignment as follows,

$minimap --MD -L -2 -ax asm5 -f 0.005 -t $threads -I $size -K $mem $ref $qry -o chr2_parts.sam

When I convert into bam I got same issue in other line,

[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 15456
[main_samview] truncated file.

The error line can be seen at https://transfer.ipk-gatersleben.de/upload2/SkcrBWb0/. if its related to CIGAR length issue, Would you please recommend a good way to align large chromosomes and get the output in bam which is required for downstream analysis.

Thanks!!!

@lh3
Copy link
Owner

lh3 commented Jul 4, 2019

Your last clipping again goes beyond 268435456. My recommendation is: don't use BAM. If you have to, split the chromosomes to length shorter than the threshold.

As this is not a minimap2 issue, I am closing this now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants