Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

minimap2 does not respect large -K option? #613

Closed
mrvollger opened this issue May 22, 2020 · 3 comments
Closed

minimap2 does not respect large -K option? #613

mrvollger opened this issue May 22, 2020 · 3 comments

Comments

@mrvollger
Copy link

Hi!

I have been mapping some really large contigs and for this I find it useful to adjust -K so I can map more than one of them at a time. However, I find that when I increase -K beyond 2147m it only loads/maps one sequence at a time. Below I have a minimal example that shows this:

Make test data:

$ printf ">test\nAAAAAAGGGGGGGGGGGGGGGGCCCCCCCCCCCCTTTTTTTTTTTTT\n" > test_ref.fasta && cat  test_ref.fasta test_ref.fasta test_ref.fasta test_ref.fasta  > test_reads.fasta
$ minimap2 --version
2.17-r941

Maps all 4 sequences at once with -K 2147m

$ minimap2 -t 128 -K 2147m test_ref.fasta test_reads.fasta  > /dev/null
[M::mm_idx_gen::0.002*3.23] collected minimizers
[M::mm_idx_gen::0.018*13.35] sorted minimizers
[M::main::0.018*13.32] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.018*13.24] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.018*13.17] distinct minimizers: 4 (75.00% are singletons); average occurrences: 1.250; average spacing: 9.400
[M::worker_pipeline::0.029*8.72] mapped 4 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -t 128 -K 2147m test_ref.fasta test_reads.fasta
[M::main] Real time: 0.030 sec; CPU: 0.255 sec; Peak RSS: 0.005 GB

Maps sequences one at a time with -K 2148m

$ minimap2 -t 128 -K 2148m test_ref.fasta test_reads.fasta  > /dev/null
[M::mm_idx_gen::0.002*3.35] collected minimizers
[M::mm_idx_gen::0.016*15.18] sorted minimizers
[M::main::0.016*15.14] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.016*15.05] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.017*14.86] distinct minimizers: 4 (75.00% are singletons); average occurrences: 1.250; average spacing: 9.400
[M::worker_pipeline::0.028*9.42] mapped 1 sequences
[M::worker_pipeline::0.039*7.30] mapped 1 sequences
[M::worker_pipeline::0.048*6.20] mapped 1 sequences
[M::worker_pipeline::0.057*5.49] mapped 1 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -t 128 -K 2148m test_ref.fasta test_reads.fasta
[M::main] Real time: 0.057 sec; CPU: 0.311 sec; Peak RSS: 0.004 GB

If you can verify this issue, there should probably be a note in the man page, but ideally I would like to be able to adjust beyond -K 2147.

Thanks!
Mitchell

@lh3
Copy link
Owner

lh3 commented May 22, 2020

Duplicate of #491 and #562. Try the github HEAD.

@lh3 lh3 closed this as completed May 22, 2020
@lh3 lh3 added the duplicate label May 22, 2020
@mrvollger
Copy link
Author

Sorry I should have looked longer at the other issues.

I am noticing similar behavior with yak, and I don't think there is an issue for it there yet. Should I make one?

Will process many at once:

 ./yak count -K 2000000000  mat_ill.fasta -o  /dev/null

Will only process one at a time:

 ./yak count -K 20000000000  mat_ill.fasta -o  /dev/null

@lh3
Copy link
Owner

lh3 commented May 22, 2020

yak and minigraph have a similar issue. I haven't fixed those yet... You can create a new issue as a reminder for me. Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants