Releases · rhysnewell/Lorikeet

06 Dec 05:05

rhysnewell

v0.8.2

a9c4fe8

v0.8.2 Latest

Latest

What's Changed

Doco fixes by @wwood in #54
Lower mem usage by @rhysnewell in #55
Dev by @rhysnewell in #56

Full Changelog: v0.8.1...v0.8.2

Contributors

wwood and rhysnewell

Assets 2

10 Jul 05:17

rhysnewell

v0.8.1

081e6f9

v0.8.1

What's Changed

Merge 'Dev' in to "master" by @rhysnewell in #49
Compilation fix by @wwood in #50
Fix VCF annotations by @rhysnewell in #51
Dev to main by @rhysnewell in #53

Full Changelog: v0.8.0...v0.8.1

Contributors

wwood and rhysnewell

Assets 2

03 May 02:59

rhysnewell

v0.8.0

2bf7cdb

v0.8.0

What's Changed

Catching dev up to master branch by @rhysnewell in #46
cli: Allow --profile very-fast. by @wwood in #47

Full Changelog: v0.7.3...v0.8.0

Contributors

wwood and rhysnewell

Assets 2

04 Oct 03:05

rhysnewell

v0.7.3

3d9850f

v0.7.3

fix: release workflow copying old minimap2 header binary

Assets 3

05 Aug 05:31

rhysnewell

v0.7.2

7a221d8

v0.7.2

fix: fst calculations are now ploidy agnostic

Assets 3

05 Aug 05:30

rhysnewell

v0.7.2rc1

b666726

v0.7.2rc1

fix: new releases are tagged correctly

Assets 3

17 Oct 20:48

rhysnewell

pre-release_master

f02a0f8

Development build: master Pre-release

Pre-release

pre-release_master

Update pre-release-lorikeet.yml

Assets 3

13 Oct 06:26

rhysnewell

v0.6.0rc2

dcbaf84

v0.6.0rc2

Version 0.6.0 - release candidate 2

This release candidate reintroduces consensus genome calling and strain genome discovery.
It also updates the linkage algorithm from previous versions, now utilizing a more sophisticated graph based approach for linking clusters

Assets 3

06 Oct 11:29

rhysnewell

v0.6.0rc1

9c84a73

v0.6.0rc1

v0.6.0 Release Candidate 1

This release introduces the completely overhauled variant calling setup for Lorikeet. No longer does lorikeet rely on threshold based variant calling approaches, and instead takes a more sophisticated approach utilising local re-assembly of active regions. This release includes a reimplementation of the GATK HaplotypeCaller algorithm but in Rust, so hopefully it is faster. It will be at least be easier to parse multiple genomes + samples into the algorithm at once to generate called variants.

Currently, the strain resolving part of lorikeet is hidden and will be re-enabled ASAP.

The HaplotypeCaller algorithm involves breaking up genomes into potential active regions and then performing local re-assembly with the reads that mapped to those locations. The local assembly is then searched for potential haplotypes using a number of techniques and candidate haplotypes are assigned likelihoods using a pairwise HMM model to re-assign reads to the haplotypes. Ultimately, the HaplotypeCaller algorithm produces sets of high confidence variants with depths across samples.

The HaplotypeCaller code was re-implemented in Rust in order to potentially speed up the variant calling process, make it easier to parse multiple genomes and samples into the algorithm, and hopefully make use of some of the code base in future projects and in the strain resolving pipeline.

The code requires benchmarking, but early indications from tests and small datasets puts the Lorikeet variant calling speed on par with the Java implementation. I believe the real speed up will appear when multiple genomes are supplied to Lorikeet as they will be run in parallel seamlessly.

Additionally, a number of code clean-ups should be implemented as soon as possible. Primarily around the BirdToolRead, SequencesForKmers, and Kmers data structures. Currently, accessing the bytes within a read requires cloning the data with no option to create a reference pointing the data (without the added complexity of decoding every encoded base). This means SequencesForKmers and Kmers each hold a clone of the read bases which is very costly. I believe by adding a bases field to BirdToolRead that is updated when the underlying Read is changed, we can change those clones to be references and wrangle with the lifetimes to significantly speed up the graph building stage of the algorithm.

TODO:

Reimplement strain calling + abundance estimation
Reimplement consensus calling
Update README
Update Workflow image
Various code improvements

Assets 2

12 Nov 05:06

rhysnewell

v0.5.0

99381d1

Revised genotyping

So, in keeping with tradition this release brings a bunch of changes to Lorikeet that make it pretty distant from where it was a month ago. I know only a few people are trying to keep track of all changes that keep being made here, and I'm sorry things are so stochastic. I think the words of my supervisor put it best when I told him about one of the changes I had made... "Ah, so freebayes is out this week, huh?"

Yeah, freebayes is out. Cancelled. For generating illegal instructions and segmentation fault on GPU nodes. I ain't fixing that, I'll just make my own variant caller.

Lorikeet's new best friends are UMAP and HDBSCAN. The curse of dimensionality hexed me pretty good during benchmarking, so UMAP is being used for dimensionality reduction. I chose it over PCA since it seems to discriminate grouping of variants way better. Also, since we now have to use a python library for UMAP, might as well upgrade fuzzy DBSCAN to it's better version: HDBSCAN

Changes:

Freebayes. OUT.
Fuzzy DBSCAN. OUT.
UMAP. IN.
HDBSCAN. IN.
Evolve now reports per sample dNdS and coverage values for each ORF

Current workflow:

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Version 0.6.0 - release candidate 2

v0.6.0 Release Candidate 1

Changes:

Current workflow:

Releases: rhysnewell/Lorikeet

v0.8.2

What's Changed

Contributors

v0.8.1

What's Changed

Contributors

v0.8.0

What's Changed

Contributors

v0.7.3

v0.7.2

v0.7.2rc1

Development build: master

v0.6.0rc2

Version 0.6.0 - release candidate 2

v0.6.0rc1

v0.6.0 Release Candidate 1

Revised genotyping

Changes:

Current workflow: