Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit cf66ef8a27146afe575b64e135d117b212f7bd64
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 31 14:04:42 2024 +0100

    Updated tests.

commit eae157eaae352ce9b38c268cd647ff5ffa6bdf61
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 31 13:57:45 2024 +0100

    Updated changelog.

commit 6b2f8ebc1d421cd096ee9df7c5951fc373e54982
Merge: 8666f7e8 0c888e84
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 31 13:23:40 2024 +0100

    Merge branch 'master' into dev

commit 0c888e84367dfba1d1ab6d23a3aa663ad3c61440
Merge: 19c730d5 56f5d14
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 31 13:23:25 2024 +0100

    Merge branch 'master' of https://github.com/bbuchfink/diamond

commit 8666f7e853efe8c8a1d7d0bdde46ad810db7475c
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 31 12:34:15 2024 +0100

    Fixed output.

commit a70a755c04135b310d119e69ab9a899a6ace94b0
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 31 11:09:09 2024 +0100

    Changed max block size for vsens mode.

commit df9be960f04c8c41b49495746790e44784e7b138
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 16:56:36 2024 +0100

    Fixed taxon format.

commit 565bd4fd37d4116cb6cbd147eed851fd627c81af
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 16:45:54 2024 +0100

    Fixed error.

commit 67df3d4af021e7d39fa559064b3122dd825ab96b
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 16:43:01 2024 +0100

    Added taxon lineage option.

commit e4203e82775fb50789bfe5826b0dc50aa2fc8e67
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 15:57:39 2024 +0100

    Added gapped seqs to help.

commit 98b64461cae30e158fbd2e27eaf9d00d9fb53a59
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 15:47:50 2024 +0100

    Fixed warnings.

commit ab40c4a23dc16a11c9af50657ba40bf3ea734f8d
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 15:41:20 2024 +0100

    Fixed warnings.

commit d9dc1ae43167fb8f496b97742e021b0300b24526
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 15:26:53 2024 +0100

    Fixed warnings.

commit ad6c31999575a2f0b3e84192a91a73fa0eab473e
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 15:02:00 2024 +0100

    Updated version.

commit 30f509a8476c8b1ed4dd1497244da554301412ae
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 14:39:42 2024 +0100

    Added shapes-30x10 mode.

commit dc64958415be88c5ecfbb4b47e48ad5c34f78e4d
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 14:29:17 2024 +0100

    Add per round cutoff options.

commit 4193abd0cf7da091bd25b794c278577075c0b9f8
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 13:43:06 2024 +0100

    Added setting ccd per round.

commit d78caa14b78cd8f690a02df08ef7951934486e41
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jan 30 12:59:40 2024 +0100

    Added shapes30x10 mode.

commit 30221fa8e1b1564fc28ab669a4f08a89da45b44b
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jan 29 16:16:44 2024 +0100

    Removed famsa, incremental clustering.

commit ba15651e286bfc4460343faeb5b08dd98bc97176
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jan 29 15:58:38 2024 +0100

    Update reassign for mutual cover.

commit cf5a84ec02058870e0c42020c66dcf423baf4383
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jan 29 15:30:01 2024 +0100

    Updated recluster for mutual coverage.

commit e6032ba102c805d565096b4cfc1e0c0d34ad9168
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jan 26 17:52:49 2024 +0100

    Update recluster for mutual cover.

commit bdee2616dfbe9195841bac59438d5cc44733a5d4
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jan 26 16:37:24 2024 +0100

    Fixed issue with length lookup.

commit 739be0ea92aee7d32837ab39c70efc2a792f37aa
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jan 26 11:20:14 2024 +0100

    Fixed issue with round values.

commit 34f0b3a73204117b41efc20883fa583e3a89339a
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Jan 25 13:28:44 2024 +0100

    Reworked storing target ids in hits logic.

commit 76e17c7443d4362bd821f855a2e8dc8598c67710
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 24 18:00:39 2024 +0100

    Rework target ids in seed array.

commit 31f0f34a3c06e52c9a87f565260375376601553e
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 24 15:08:28 2024 +0100

    Fixed error.

commit 69680a20ad46fbc9618d5fca59cb7b76104946ca
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 24 13:20:32 2024 +0100

    Use ccd=0 in last round.

commit a428f0d9dfcd3225ec34bb358984ac1c6a2e4cd0
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 24 13:07:51 2024 +0100

    Added round approx id setting.

commit 261cf75522b9a703644329bfb24164d53d126e84
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jan 24 12:49:19 2024 +0100

    Fixed error.

commit 1f3f8a7acd480f4814f5a0f9eaee5b17630c7b11
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jan 22 17:00:42 2024 +0100

    Rework round coverage option

commit fb55d8eba76e8726b241d3f4fee81e73922af753
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jan 22 15:55:01 2024 +0100

    Apply evalue max in cascaded clustering.

commit ab498d0bbffe523b4e3087149cd7b6be5c29b096
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jan 22 15:25:31 2024 +0100

    Apply coverage increment.

commit f5dd1367549c7932185c78218b0eb6c4618e7e01
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jan 19 16:49:23 2024 +0100

    Fixed evalue calculation.

commit 87aff2d0eab54fc4440a5046f8d06f12440fec1b
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jan 12 12:26:15 2024 +0100

    Added shapes.

commit a49026b0319b04675ac3fa65c851bb116f792c43
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Jan 11 14:52:17 2024 +0100

    Added new weight 10 modes.

commit 590e1756fd198b288b4010414010d8b9baed089a
Author: Vincent Spath <90271444+v-spath@users.noreply.github.com>
Date:   Wed Nov 29 17:24:01 2023 +0100

    DNA only mapping (#18)

    * seed repetition cutoff

    * remove overhead

    * back to array

    * test

    * repetitive filter

    * changed float to double for parser

    * fixed count typo

    * removed io header

    * min heap seed filtering

    * fix heap

    * filter multithreaded

    * multithreaded optimized

    * locked guard

    * test new multithreading filter

    * filter multithreaded optimized

    * pass thread index during creation of thread

    * check smallest element before adding to queue

    * remove unordered set, replace by checking against cutoff value

    * refactor

    * new heuristics

    * add ungapped extension to set

    * new timer

    * temp

    * new timer

    * ssh

    * d

    * Delete Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz

    * new timer

    * timer

    * remove timer

    * temp

    * Delete Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz

    * temp

    * Delete Arabidopsis_thaliana.TAIR10.dna.chromosome.Pt.fa.gz

    * temp

    * Delete Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz

    * refactor

    * seed filter

    * target filter

    * target filter

    * target filter

    * revert

    * r_ungapped to score

    * fix

    * wfadaptive

    * setheuristicnone

    * remove ungapped eval

    * repetition cutoff 0

    * fix filter for small index

    * backup chaining and apple specifics

    * revert

    * test ksw2 for mac

    * chaining and bug fixes and apple specifics

    * backtracking

    * backup chaining, with old out commented code, 1 bug left

    * working_chaining

    * add dna chaining condition

    * basic chaining

    * Delete src/align/align.cpp.save

    * repetition-cutoff into WITH_DNA

    * Delete libwfa2cpp.a

    * add fast approximation of log2

    * Delete setup.cpp.save

    * delete binary obj

    * remove key in RepetitiveCounter

    * removed vector of priority_queues as class member

    * Update dna_index.cpp

    check size repetitive before adding to queue

    * extended struct SeedMatch

    * fixed bug 100 % ungapped score

    * struct for data an chain

    * removed std output

    * update chain.cpp

    * change built queue

    * add sorting before chaining

    * chaining only for one target at a time

    * head

    * comments

    * include wfa2 in comment

    * include wfa comment

    * Update log2_fast.h

    * chain_end two parameters

    * fixed bug backtrack

    * make variables const if possible

    * parameters struct

    * struct anchor data

    * update documentation

    * backtrack simplification

    * update backtrack

    * delete wrong commit

    * map hsp

    * remove RepetitiveCounter struct

    * updated chain struct

    * construction Chain Object

    * chain Standard Komparator

    * use iterator

    * mapping

    * correction syntax

    * update indices

    * mapping quality

    * fixed index t_id

    * mapping output includes n_anchors (cm)

    * remove redundancy

    * fixed bug match build

    * primary computation for all targets

    * overlap percentage of shorter chain

    * fix compile error

    * dna extension

    * with dna compile

    * debug test

    * Update extension.cpp

    * fix segmentation fault

    * test why slow

    * test only map best

    * test map percentage

    * test 1

    * sort score

    * todo

    * move semantics

    * forward/reverse together

    * test seed lookup

    * Update extension.cpp

    * add chaining penalties

    * new primary chain computation

    * Update extension.cpp

    * Update extension.cpp

    * simplify chain structure

    * use PAF when only mapping

    * Update config.cpp

    * compile error on mac

    * Merge branch 'personal_dev' into dev

    # Conflicts:
    #	src/basic/config.cpp
    #	src/run/config.cpp
    #	src/run/double_indexed.cpp
    #	src/search/stage0.cpp

    * Todo: align long reads

    * basic chaining alignment

    * fix bug short reads reverse (now correct)

    * chain alignment

    * integrate ksw2 correctly

    * update WFA

    * corrected anchor alignment

    * fixed residue matches mapping

    * change semantics

    * refacator chain alignment

    * integrate WFA

    * correct primary computation

    * Update extension.h

    * chaining penalties scaling factors

    * correction chain alignment

    * compute correct alignment score

    * correct paf format DNA

    * clean up includes

    * filter chains

    * comments

    * wfa low memory mode

    * filter chains with lower bound

    ---------

    Co-authored-by: Dimi <dimitrios_K@gmx.de>
    Co-authored-by: Dimitrios Koutsogiannis <dkoutso@taco.eb.local>
    Co-authored-by: Dimi99 <73211787+Dimi99@users.noreply.github.com>
    Co-authored-by: vinceaps <90271444+vinceaps@users.noreply.github.com>
    Co-authored-by: Benjamin Buchfink <buchfink@gmail.com>

commit b8f12f41864bc2c2795bcf9ed22b867d717bc5da
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Nov 22 10:52:38 2023 +0100

    Add cmake flag for famsa.

commit 7fdecdb2879997b6b6a1079ed28ee2ecacf25229
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Nov 15 16:52:41 2023 +0100

    Added hit culling.

commit e3eadb911af4df39aa08bf0f560ec1068d4f5211
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Nov 9 17:02:54 2023 +0100

    Fixed non x86 compile.

commit b258c2f5be5421c5c80e44b77b4870d24ac6c603
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Nov 9 16:58:40 2023 +0100

    Fixed non x86 compile.

commit e055ce95c60e769da4f496232108b3968d0c4bf9
Author: Vincent Spath <90271444+v-spath@users.noreply.github.com>
Date:   Tue Nov 7 15:59:40 2023 +0100

    DNA chaining (#17)

    * seed repetition cutoff

    * remove overhead

    * back to array

    * test

    * repetitive filter

    * changed float to double for parser

    * fixed count typo

    * removed io header

    * min heap seed filtering

    * fix heap

    * filter multithreaded

    * multithreaded optimized

    * locked guard

    * test new multithreading filter

    * filter multithreaded optimized

    * pass thread index during creation of thread

    * check smallest element before adding to queue

    * remove unordered set, replace by checking against cutoff value

    * refactor

    * new heuristics

    * add ungapped extension to set

    * new timer

    * temp

    * new timer

    * ssh

    * d

    * Delete Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz

    * new timer

    * timer

    * remove timer

    * temp

    * Delete Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz

    * temp

    * Delete Arabidopsis_thaliana.TAIR10.dna.chromosome.Pt.fa.gz

    * temp

    * Delete Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz

    * refactor

    * seed filter

    * target filter

    * target filter

    * target filter

    * revert

    * r_ungapped to score

    * fix

    * wfadaptive

    * setheuristicnone

    * remove ungapped eval

    * repetition cutoff 0

    * fix filter for small index

    * backup chaining and apple specifics

    * revert

    * test ksw2 for mac

    * chaining and bug fixes and apple specifics

    * backtracking

    * backup chaining, with old out commented code, 1 bug left

    * working_chaining

    * add dna chaining condition

    * basic chaining

    * Delete src/align/align.cpp.save

    * repetition-cutoff into WITH_DNA

    * Delete libwfa2cpp.a

    * add fast approximation of log2

    * Delete setup.cpp.save

    * delete binary obj

    * remove key in RepetitiveCounter

    * removed vector of priority_queues as class member

    * Update dna_index.cpp

    check size repetitive before adding to queue

    * extended struct SeedMatch

    * fixed bug 100 % ungapped score

    * struct for data an chain

    * removed std output

    * update chain.cpp

    * change built queue

    * add sorting before chaining

    * chaining only for one target at a time

    * head

    * comments

    * include wfa2 in comment

    * include wfa comment

    * Update log2_fast.h

    * chain_end two parameters

    * fixed bug backtrack

    * make variables const if possible

    * parameters struct

    * struct anchor data

    * update documentation

    * backtrack simplification

    * update backtrack

    * delete wrong commit

    * remove RepetitiveCounter struct

    * updated chain struct

    * construction Chain Object

    * chain Standard Komparator

    * use iterator

    * correction syntax

    * fixed index t_id

    * move semantics

    ---------

    Co-authored-by: Dimi <dimitrios_K@gmx.de>
    Co-authored-by: Dimitrios Koutsogiannis <dkoutso@taco.eb.local>
    Co-authored-by: Dimi99 <73211787+Dimi99@users.noreply.github.com>
    Co-authored-by: vinceaps <90271444+vinceaps@users.noreply.github.com>
    Co-authored-by: Benjamin Buchfink <buchfink@gmail.com>

commit 19c730d59620efa02e9b7a6928283962767a64c3
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Nov 7 15:54:53 2023 +0100

    Add output field for lineage.

commit 7cb96085fa2950f0d500e468331d0f7a8367c182
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Nov 7 10:48:11 2023 +0100

    Check for identical block.

commit 8438e3387278dad3405ed9c2f7b1b6a29be94407
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Oct 26 14:23:56 2023 +0200

    remove alignment computation.

commit 52649efac7cc76c4ab7822c7848cd74763582c07
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Oct 25 17:35:38 2023 +0200

    Filter by mutual coverage.

commit 1bfa4e2a82e52fb91241c48eeb349a29f7dbed53
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Oct 25 17:08:11 2023 +0200

    Added alignment.

commit 1149045d6c55fcf40604248e57c8c93b3ddbd4ab
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Oct 25 12:55:15 2023 +0200

    Fix centroid lookup.

commit dcef57b958768f36c69dc6a1c19527082a043f9e
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Oct 24 16:35:01 2023 +0200

    Added parallel processing.

commit c0ebcc97a3f0201ca5a271199795a144f97c49c5
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Oct 24 13:48:51 2023 +0200

    Fixed bug.

commit ff8b053116a7df875e59fafff7e933da10c2bddf
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Oct 24 13:43:18 2023 +0200

    Check for indirection.

commit 539416b50a933cde0594c9a110b4945d6c366d57
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Oct 24 12:33:32 2023 +0200

    Write output.

commit 0fdfe23a8695efeb8e824fb07e79a959e5ce317a
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Oct 23 16:59:07 2023 +0200

    Process alignments.

commit 83b0442e0efc20f28e703d16ebf52503015189d0
Merge: 8d0ce419 5171828f
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Sat Oct 21 14:39:44 2023 +0200

    Merge branch 'dev' of https://github.com/bbuchfink/diamond_dev into dev

commit 8d0ce419c8bf9c835403ca1afd1ee0b8be5d4b5e
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Sat Oct 21 14:39:34 2023 +0200

    Fixed error.

commit 5171828faf97348653d5447f4765e8c041075c58
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Oct 20 14:50:12 2023 +0200

    Fixed error.

commit 367ac23fc68acbb4c195d6136f5d83d2e24a9770
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Oct 19 17:38:31 2023 +0200

    Added profile alignment.

commit 03b51a98073196fb03494979fa03ee917448b376
Merge: 13a3e512 6096ef28
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Oct 18 16:26:27 2023 +0200

    Merge branch 'dev' of https://github.com/bbuchfink/diamond_dev into dev

commit 13a3e512efd362d6746359fd17580fc4dfee6908
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Oct 18 16:26:08 2023 +0200

    Fixed blastdb.

commit 6096ef2841d6e0c87d58292b4238eb9ce3d6b17d
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Oct 17 16:53:37 2023 +0200

    Added local alignment of profiles.

commit 8a5eca0d72348012775177f929447ddcfd5db0a7
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Oct 13 15:32:15 2023 +0200

    Fix leak.

commit d2563fd8c85c6891064ce2a8010a5c38e94c63b9
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Oct 13 15:04:16 2023 +0200

    Added profile-recluster workflow.

commit d4b638e4fee52ae4024d35e109e81bed1f9ee32d
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Oct 12 17:03:33 2023 +0200

    Added MSA computation.

commit a7e375040af808cdea6ae2bfc9a43b2fb285b1db
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Oct 12 14:24:55 2023 +0200

    Added FAMSA.

commit 88119a03726d7288c406237d0b975c1ad304c426
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Sep 27 16:52:47 2023 +0200

    Use letter count of subdb in clustering.

commit bed1a484f7f28beb654d12817431c8a7610faca0
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Sep 27 16:28:07 2023 +0200

    Added length lookup.

commit 81843d6fa42081219f89d0833838b8d481dfec23
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Sep 26 13:54:08 2023 +0200

    Use coverage increment in earlier rounds.

commit 53a20c61e5f4e68b41505bc08dcb776070cbca35
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Sep 26 13:40:00 2023 +0200

    Use ccd=0 in linear stages.

commit e5699a10fb0b8da76f1152ac5773d3cafdb304a1
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Sep 18 16:52:45 2023 +0200

    Use median sequence.

commit da749bc1fb1826bebbeadd80ad8a46adc70dc118
Merge: eb336685 a1c35d03
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Sep 18 11:44:00 2023 +0200

    Merge branch 'dev' of https://github.com/bbuchfink/diamond_dev into dev

commit eb33668574a28a84ec585613c8d31de644d89ff6
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Sep 18 11:43:25 2023 +0200

    Fixed subject source range.

commit a1c35d03f4ba0f4a56461d431df6dbde277b85eb
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Sep 11 14:50:49 2023 +0200

    Auto set min length ratio.

commit 612df402ca9fafc2c5cfed2d54b62354041edb8f
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Sep 11 13:43:35 2023 +0200

    Added mutual cover clustering.

commit c909a44381fa52d6a359d44b607fedc59690eeff
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Sep 8 11:15:33 2023 +0200

    Added connected component depth.

commit b613c12c6d4f5abaa9914075626414f39392b7b0
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Sep 6 16:58:28 2023 +0200

    Added cc clustering.

commit abd593504534f23632b98787f5053d5d93d23dc4
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Sep 5 17:04:45 2023 +0200

    Added callback for bidirectional clustering.

commit 76b222058f50e2fee6664159e0df47cb518b667b
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Sep 5 15:46:27 2023 +0200

    Added --no-reassign to gvc.

commit 268090426a40c2b22f04b8a0ced5101b6195ba5d
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Sep 5 10:31:51 2023 +0200

    Fixed approx_id for anchored swipe.

commit 4acb106d2495d0cd67481567607c81184081fbf9
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Aug 28 14:56:25 2023 +0200

    Fixed bug.

commit ccae0326a15e745e07c54edced47e3284b92c949
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Aug 28 13:26:12 2023 +0200

    Added length ratio filter.

commit 43eb577fd11eca0a84901b558013a409edcfd5c6
Merge: 49c3fffb 7d89b2b0
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Aug 25 16:04:42 2023 +0200

    Merge branch 'dev' of https://github.com/bbuchfink/diamond_dev into dev

commit 49c3fffb58f22405b1496b2f18fe8c04e35298b2
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Aug 25 16:03:43 2023 +0200

    Added len ratio filter.

commit 7d89b2b0197901a8727e2814ea35605e2cf5cdaf
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Jul 20 16:12:54 2023 +0200

    Fixed warning.

commit 6c4c4d7994f80f8603027cabf1d966ead048b428
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Thu Jul 20 16:02:58 2023 +0200

    Added sorting.

commit 97c350241a5b309b3fc1862b3666365834cfbe39
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jul 19 11:23:49 2023 +0200

    Fixed full_qqual field.

commit 4ba1920cbe2d4ed3cb45d803126296a2bcc1a775
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jul 18 18:50:34 2023 +0200

    Add len-ratio filter.

commit 58bf56ec153b9f5944b672db9f136aec0e1af705
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Tue Jul 18 16:19:29 2023 +0200

    Added symmetric option to gvc.

commit 4d8f6d6aee2d895d7a62ace47689432dc31e6de3
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Mon Jul 17 16:43:32 2023 +0200

    Added mutual cov lin-stage1.

commit 6693d01a2004346cd35f209f448aa5803a347737
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jul 14 16:59:11 2023 +0200

    Added linclust stage with mutual cov.

commit dd3fb04acc23760257664ea6c4f6127dd576f8c6
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jul 14 12:41:03 2023 +0200

    Added mutual coverage.

commit 97c1e2e54dd5a540800b022a7e3bb513ae5da429
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jul 12 15:55:04 2023 +0200

    Fixed sam query field.

commit 2af1fa4028f30bbb9da4363d80f268ead311e3f5
Merge: e1d1c047 23a1ba7
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Wed Jul 12 15:54:11 2023 +0200

    Merge branch 'master' into dev

commit e1d1c047d6459ced42b85ce897a5d42a7640714e
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jun 23 15:11:53 2023 +0200

    Added length ratio filter.

commit 59a52cd0bcf1458bc311b36ba3911fd00bee1319
Merge: 561b9632 af4fc64
Author: Benjamin Buchfink <buchfink@gmail.com>
Date:   Fri Jun 23 14:24:04 2023 +0200

    Merge branch 'master' into dev

commit 561b96324e32731019b3e3ca7d3df5b54caa9217
Merge: 70409de0 749de49c
Author: Dimi99 <73211787+Dimi99@users.noreply.github.com>
Date:   Wed Jun 21 14:21:55 2023 +0200

    Merge pull request #12 from Dimi99/dev

    Add the WFA Extension-Option

commit 749de49c7d90b6fee3c331bd009fb86dfcce4557
Merge: 141f517e 52cdcbae
Author: Dimi <dimitrios_K@gmx.de>
Date:   Wed Jun 21 14:13:18 2023 +0200

    Merge branch 'dev' of github.com:Dimi99/diamond_dev into dev

commit 141f517ee52480501ac3a79b72be8ca05818cde0
Author: Dimi <dimitrios_K@gmx.de>
Date:   Wed Jun 21 14:13:09 2023 +0200

    changes

commit 52cdcbae9023e38c94235778a33abb4542425b6a
Merge: c6c9bfff 70409de0
Author: Dimi99 <73211787+Dimi99@users.noreply.github.com>
Date:   Wed Jun 21 12:35:12 2023 +0200

    Merge branch 'dev' into dev

commit c6c9bfff21d275992a55da542553120d6af0231a
Author: Dimi <dimitrios_K@gmx.de>
Date:   Wed Jun 21 12:31:25 2023 +0200

    cmake

commit 9a19d041c72f09aca50a41913b2adb8e4f10eae8
Author: Dimi <dimitrios_K@gmx.de>
Date:   Wed Jun 21 12:30:59 2023 +0200

    cmake

commit 5dd3ac5e6cd6db4062e2da9b726d155ba2f823e6
Author: Dimi <dimitrios_K@gmx.de>
Date:   Wed Jun 21 12:23:41 2023 +0200

    timer
  • Loading branch information
bbuchfink committed Jan 31, 2024
1 parent 56f5d14 commit ba08828
Show file tree
Hide file tree
Showing 200 changed files with 6,108 additions and 448,714 deletions.
31 changes: 14 additions & 17 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ option(DP_STAT "DP_STAT" OFF)
option(SINGLE_THREADED "SINGLE_THREADED" OFF)
option(EIGEN_BLAS "EIGEN_BLAS" OFF)
option(WITH_ZSTD "WITH_ZSTD" OFF)
option(KEEP_TARGET_ID "KEEP_TARGET_ID" OFF)
option(HIT_KEEP_TARGET_ID "HIT_KEEP_TARGET_ID" OFF)
option(LONG_SEEDS "LONG_SEEDS" OFF)
option(WITH_AVX512 "WITH_AVX512" OFF)
Expand Down Expand Up @@ -76,10 +75,6 @@ if(EXTRA)
add_definitions(-DEXTRA)
endif()

if(KEEP_TARGET_ID)
add_definitions(-DKEEP_TARGET_ID)
endif()

if(HIT_KEEP_TARGET_ID)
add_definitions(-DHIT_KEEP_TARGET_ID)
endif()
Expand All @@ -100,7 +95,12 @@ if(WITH_MCL)
add_definitions(-DWITH_MCL)
endif()

if(WITH_FAMSA)
add_definitions(-DWITH_FAMSA)
endif()

add_definitions(-DMAX_SHAPE_LEN=${MAX_SHAPE_LEN})
add_definitions(-D_ITERATOR_DEBUG_LEVEL=0)

IF(STATIC_LIBGCC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -static-libgcc")
Expand Down Expand Up @@ -146,13 +146,13 @@ endif()

if (${CMAKE_CXX_COMPILER_ID} STREQUAL MSVC)
add_definitions(-D_CRT_SECURE_NO_WARNINGS)
add_definitions(-D_HAS_STD_BYTE=0)
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Wno-implicit-fallthrough -Wreturn-type -Wno-unused -Wno-unused-parameter -Wno-unused-variable -Wno-uninitialized -Wno-deprecated-copy -Wno-unknown-warning-option")
# -fsanitize=address -fno-omit-frame-pointer
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Wno-implicit-fallthrough -Wreturn-type -Wno-unused -Wno-unused-parameter -Wno-unused-variable -Wno-uninitialized -Wno-deprecated-copy -Wno-unknown-warning-option ")#-g -fsanitize=address -fno-omit-frame-pointer ")
endif()

if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-pragma-clang-attribute -Wno-overloaded-virtual -Wno-missing-braces")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-pragma-clang-attribute -Wno-overloaded-virtual -Wno-missing-braces") #-g -fsanitize=thread -fno-omit-frame-pointer" )
endif()

if (CMAKE_COMPILER_IS_GNUCC AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5)
Expand Down Expand Up @@ -391,8 +391,6 @@ set(OBJECTS
src/util/tsv/merge.cpp
src/util/tsv/join.cpp
src/dp/scalar/smith_waterman.cpp
src/cluster/incremental/config.cpp
src/cluster/incremental/run.cpp
src/align/short.cpp
src/tools/tsv.cpp
)
Expand All @@ -402,10 +400,14 @@ if(WITH_DNA)
src/dna/smith_watermann.cpp
src/dna/dna_index.cpp
src/dna/seed_set_dna.cpp
src/dna/extension.cpp
src/dna/chain.cpp
src/dna/extension_chain.cpp
#src/dna/align.cpp
src/lib/ksw2/ksw2_extz2_sse.c
src/dna/ksw2_extension.cpp
src/lib/ksw2/ksw2_extz.c
src/lib/WFA2-lib.diamond/bindings/cpp/WFAligner.cpp
)
)
endif()

if(WITH_MCL)
Expand Down Expand Up @@ -482,15 +484,10 @@ if(BLAST_INCLUDE_DIR)
endif()
endif()



if(WITH_DNA)
include_directories(src/lib/WFA2-lib.diamond)
add_subdirectory(src/lib/WFA2-lib.diamond)

target_link_libraries(diamond wfa2)

target_compile_options(wfa2 PRIVATE -DCMAKE_BUILD_TYPE=Release -DEXTRA_FLAGS="-ftree-vectorize -msse2 -mfpmath=sse -ftree-vectorizer-verbose=5")
endif()

if(EIGEN_BLAS)
Expand Down
30 changes: 30 additions & 0 deletions src/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
[2.1.9]
- Corrected the prefix of the query length field for the SAM format.
- Added the size modifiers 'T', 'M' and 'K' for the `--memory-limit`/`-M`
option.
- Added the option `--mutual-cover` to cluster sequences by mutual coverage
percentage of the cluster representative and member sequence.
- Added the option `--symmetric` for computing greedy vertex cover with
symmetric edges.
- Fixed an issue that caused the `--approx-id` option and the `approx_pident`
output field not to work correctly when using the `--anchored-swipe`
option.
- Added the option `--no-reassign` to prevent reassignment to closest
representative for the greedy vertex cover workflow.
- Added the option `--connected-component-depth` to activate clustering
of connected components at a given maximum depth for the greedy vertex
cover and the clustering workflows.
- Fixed a compiler error for Clang v17.
- Improved search performance when searching with mutual coverage threshold
by filtering for sequence length ratio.
- Added the sensitivity mode `--shapes-30x10` with sensitivity approximately
equivalent to `--mid-sensitive`.
- Added the options `--round-coverage` and `--round-approx-id` to set per
round cutoffs for cascaded clustering.
- The CMake switch `-DKEEP_TARGET_ID` is now obsolete and the corresponding
function is always available.
- Added the option `--include-lineage` to the taxonomic classification format
to include taxonomic lineage in the output.
- Added native support for the ARM NEON instruction set (contributed by
Martin Larralde).

[2.1.8]
- Fixed an issue that could cause reduced performance when running in
query-indexed mode.
Expand Down
9 changes: 4 additions & 5 deletions src/align/align.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,7 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
#endif
#include "extend.h"
#ifdef WITH_DNA
#include "../dna/wfa2_test.h"
#include "../dna/ksw2_extension.h"
#include "../dna/extension.h"
#endif
#include "../util/algo/radix_sort.h"
#include "target.h"
Expand Down Expand Up @@ -103,7 +102,7 @@ struct HitIterator {
r.push_back(Hits{ (BlockId)i,nullptr,nullptr });
return r;
}
if (i >= partition.size() - 1)
if (i >= int64_t(partition.size()) - 1)
return r;
const BlockId c = align_mode.query_contexts;
Search::Hit* begin = data + partition[i], * end = data + partition[i + 1];
Expand All @@ -120,7 +119,7 @@ struct HitIterator {
++it;
++last_query;
}
if (i == partition.size() - 2) {
if (i == (int64_t)partition.size() - 2) {
r.reserve(r.size() + query_end - (r.back().query + 1));
for (BlockId j = r.back().query + 1; j < query_end; ++j)
r.push_back(Hits{ j, nullptr,nullptr });
Expand Down Expand Up @@ -189,7 +188,7 @@ TextBuffer* legacy_pipeline(const HitIterator::Hits& hits, Search::Config& cfg,
TextBuffer *buf = nullptr;
if (!cfg.blocked_processing && *cfg.output_format != OutputFormat::daa && config.report_unaligned != 0) {
buf = new TextBuffer;
Output::Info info{ cfg.query->seq_info(hits.query), true, cfg.db.get(), *buf, {} };
Output::Info info{ cfg.query->seq_info(hits.query), true, cfg.db.get(), *buf, {}, AccessionParsing() };
cfg.output_format->print_query_intro(info);
cfg.output_format->print_query_epilog(info);
}
Expand Down
1 change: 1 addition & 0 deletions src/align/extend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ namespace Extension {
const std::map<Sensitivity, Mode> default_ext_mode = {
{ Sensitivity::FASTER, Mode::BANDED_FAST},
{ Sensitivity::FAST, Mode::BANDED_FAST},
{ Sensitivity::SHAPES30x10, Mode::BANDED_FAST},
{ Sensitivity::DEFAULT, Mode::BANDED_FAST},
{ Sensitivity::MID_SENSITIVE, Mode::BANDED_FAST},
{ Sensitivity::SENSITIVE, Mode::BANDED_FAST},
Expand Down
70 changes: 36 additions & 34 deletions src/align/global_ranking/table.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
#endif
#include "../load_hits.h"
#include "../dp/ungapped.h"
#include "../search/search.h"

using std::endl;
using std::thread;
Expand All @@ -45,44 +46,45 @@ namespace Extension { namespace GlobalRanking {
static void get_query_hits(SeedHits::Iterator begin, SeedHits::Iterator end, vector<Hit>& hits, Search::Config& cfg) {
hits.clear();
const SequenceSet& target_seqs = cfg.target->seqs();
#ifdef KEEP_TARGET_ID
auto get_target = [](const Search::Hit& hit) { return (BlockId)hit.subject_; };
auto it = merge_keys(begin, end, get_target);
while (it.good()) {
uint16_t score = 0;
for (SeedHits::Iterator i = it.begin(); i != it.end(); ++i)
score = std::max(score, i->score_);
hits.emplace_back((uint32_t)cfg.target->block_id2oid(it.key()), score, 0);
++it;
if (Search::keep_target_id(cfg)) {
auto get_target = [](const Search::Hit& hit) { return (BlockId)hit.subject_; };
auto it = merge_keys(begin, end, get_target);
while (it.good()) {
uint16_t score = 0;
for (SeedHits::Iterator i = it.begin(); i != it.end(); ++i)
score = std::max(score, i->score_);
hits.emplace_back((uint32_t)cfg.target->block_id2oid(it.key()), score, 0);
++it;
}
}
#else
else {
#ifdef BATCH_BINSEARCH
vector<Hit> hit1;
hit1.reserve(end - begin);
target_seqs.local_position_batch(begin, end, std::back_inserter(hit1), Search::Hit::CmpTargetOffset());
for (size_t i = 0; i < hit1.size(); ++i) {
hit1[i].score = begin[i].score_;
}
auto it = merge_keys(hit1.begin(), hit1.end(), Hit::Target());
while (it.good()) {
uint16_t score = 0;
for (auto i = it.begin(); i != it.end(); ++i)
score = std::max(score, i->score);
hits.emplace_back((uint32_t)cfg.target->block_id2oid(it.key()), score);
++it;
}
vector<Hit> hit1;
hit1.reserve(end - begin);
target_seqs.local_position_batch(begin, end, std::back_inserter(hit1), Search::Hit::CmpTargetOffset());
for (size_t i = 0; i < hit1.size(); ++i) {
hit1[i].score = begin[i].score_;
}
auto it = merge_keys(hit1.begin(), hit1.end(), Hit::Target());
while (it.good()) {
uint16_t score = 0;
for (auto i = it.begin(); i != it.end(); ++i)
score = std::max(score, i->score);
hits.emplace_back((uint32_t)cfg.target->block_id2oid(it.key()), score);
++it;
}
#else
auto get_target = [&target_seqs](const Search::Hit& hit) { return target_seqs.local_position((uint64_t)hit.subject_).first; };
auto it = merge_keys(begin, end, get_target);
while (it.good()) {
uint16_t score = 0;
for (SeedHits::Iterator i = it.begin(); i != it.end(); ++i)
score = std::max(score, i->score_);
hits.emplace_back((uint32_t)cfg.target->block_id2oid(it.key()), score, 0);
++it;
}
#endif
auto get_target = [&target_seqs](const Search::Hit& hit) { return target_seqs.local_position((uint64_t)hit.subject_).first; };
auto it = merge_keys(begin, end, get_target);
while (it.good()) {
uint16_t score = 0;
for (SeedHits::Iterator i = it.begin(); i != it.end(); ++i)
score = std::max(score, i->score_);
hits.emplace_back((uint32_t)cfg.target->block_id2oid(it.key()), score, 0);
++it;
}
#endif
}
}

static pair<int, unsigned> target_score(const FlatArray<Extension::SeedHit>::DataIterator begin, const FlatArray<Extension::SeedHit>::DataIterator end, const Sequence* query_seq, const Sequence& target_seq) {
Expand Down
2 changes: 1 addition & 1 deletion src/align/legacy/query_mapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ bool QueryMapper::generate_output(TextBuffer &buffer, Statistics &stat, const Se
size_t seek_pos = 0;
const char *query_title = metadata.query->ids()[query_id];
unique_ptr<OutputFormat> f(cfg.output_format->clone());
Output::Info info{ cfg.query->seq_info(query_id), true, cfg.db.get(), buffer, {} };
Output::Info info{ cfg.query->seq_info(query_id), true, cfg.db.get(), buffer, {}, AccessionParsing() };

for (size_t i = 0; i < targets.size(); ++i) {

Expand Down
2 changes: 1 addition & 1 deletion src/align/output.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ TextBuffer* generate_output(vector<Match> &targets, const Extension::Stats& stat
std::unique_ptr<OutputFormat> f(cfg.output_format->clone());
size_t seek_pos = 0;
unsigned n_hsp = 0, hit_hsps = 0;
Output::Info info{ cfg.query->seq_info(query_block_id), targets.empty(), cfg.db.get(), *out, stats };
Output::Info info{ cfg.query->seq_info(query_block_id), targets.empty(), cfg.db.get(), *out, stats, AccessionParsing() };
TranslatedSequence query = query_seqs.translated_seq(align_mode.query_translated ? cfg.query->source_seqs()[query_block_id] : query_seqs[query_block_id], query_block_id * align_mode.query_contexts);
const char *query_title = cfg.query->ids()[query_block_id];
const double query_self_aln_score = cfg.query->has_self_aln() ? cfg.query->self_aln_score(query_block_id) : 0.0;
Expand Down
5 changes: 4 additions & 1 deletion src/align/ungapped.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ WorkTarget ungapped_stage(FlatArray<SeedHit>::DataIterator begin, FlatArray<Seed
return target;
}
std::sort(begin, end);
const bool with_diag_filter = config.hamming_ext || config.diag_filter_cov > 0 || config.diag_filter_id > 0;
const bool with_diag_filter = (config.hamming_ext || config.diag_filter_cov > 0 || config.diag_filter_id > 0) && !config.mutual_cover.present();
for (FlatArray<SeedHit>::DataIterator hit = begin; hit < end; ++hit) {
const auto f = hit->frame;
target.ungapped_score[f] = std::max(target.ungapped_score[f], hit->score);
Expand Down Expand Up @@ -139,6 +139,9 @@ vector<WorkTarget> ungapped_stage(const Sequence *query_seq, const Bias_correcti
}
else {
for (int64_t i = 0; i < n; ++i) {
/*const double len_ratio = query_seq->length_ratio(target_block.seqs()[target_block_ids[i]]);
if (len_ratio < config.min_length_ratio)
continue;*/
targets.push_back(ungapped_stage(seed_hits.begin(i), seed_hits.end(i), query_seq, query_cb, query_comp, &query_matrix, target_block_ids[i], stat, target_block, mode));
for (const ApproxHsp& hsp : targets.back().hsp[0]) {
Geo::assert_diag_bounds(hsp.d_max, query_seq[0].length(), targets.back().seq.length());
Expand Down
4 changes: 2 additions & 2 deletions src/basic/basic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
#include "../util/util.h"
#include "../stats/standard_matrix.h"

const char* Const::version_string = "2.1.8";
const char* Const::version_string = "2.1.9";
using std::string;
using std::vector;
using std::count;
Expand All @@ -51,7 +51,7 @@ AlignMode::AlignMode(unsigned mode) :
case blastn:
input_sequence_type = SequenceType::nucleotide;
query_translated = false;
query_contexts = 2;
query_contexts = 1;
query_len_factor = 1;
sequence_type = SequenceType::nucleotide;
break;
Expand Down
Loading

0 comments on commit ba08828

Please # to comment.