Releases: rhysnewell/Lorikeet
v0.4.0
Version 0.4.0
A move to 0.3.x to 0.4.0 is not done lightly. Version 0.4.0 marks a major milestone in the development of lorikeet and with it comes many feature updates that are either polish mechanics of previous releases or brand new features that I hope users will find useful in understanding what lorikeet is doing.
Major changes:
SNP calling: ✨
- Lorikeet now has an inbuilt snp calling algorithm that is paired with freebayes to help extract SNPs for each input sample and help with the guided variant calling
SPEED: 🏃 💨
One of the guiding principals I had in mind when developing lorikeet was speed. Speed is a partial inspiration behind the name "Lorikeet". Lorikeets are strikingly fast birds that tend to fly in groups. Much the same that Lorikeet "flies" in parallel threads. This update reaches what I think is the optimal balance between speed and memory restrictions.
- You can now specify how many genomes to run in parallel.
- Contigs for each genome now run in parallel.
- Multiple iterators have been optimized to better utilize the capabilities of rayon
Progress: 🔢 👀
No longer will you be bombarded by a ridiculous amount of info messages that won't make much sense to anyone but me. Thanks to indicatif
, Lorikeet now has a bunch of fancy progress bars with associated ETA timers which - albeit sometimes inaccurately - provide the user with a better understanding of what is happening under the hood for each sample and each reference in their current run.
Additionally, if a run for whatever reason crashes before completion Lorikeet will now pick up from specific checkpoints and avoid rerunning entire anlayses for a specific genomes. This can be overwritten with the --force
command
Outputs: 👽
An additional file is now output for all major modes that helps tell the user how distant a specific reference might be between samples. The adjacency matrix tells the user how many variants are shared between samples for a specific reference. This will provide output similar to the trees that can be generated by taking the consensus genomes generated by polish
and parsing them to a tool like parsnp
.
Speaking of polish
, a bug has been fixed which prevented the vcf
file being output for any mode other than genotype
Genotyping: 🐀 🐁 🐩 🐕
The genotyping algorithm has seen a bunch of changes. Not all of them will be listed here as it is quite a lot.
- DBSCAN now updates parameters for each reference genome based on whether or not the supplied parameters generate clusters that make sense. i.e. Not every variant can cluster by itself, not all variants can be in the same cluster (usually)
- The read phasing linkage algorithm now happens after DBSCAN. So DBSCAN is seeding the linkage algorithm now. This will provide much the same results as before but at much faster speeds.
In addition, there have been a BUNCH of bug fixes.
v0.3.7
Multiple bug fixes
- Multiple instances of index out of bound errors
- Identified cause of freebayes failure on large metagenomes
EM algorithm for strain coverage detection implemented and working.
Updated read phasing to and clustering to prevent too highly similar clustering to occur
v0.3.6
New features:
Guided variant calling now working on some MNVs, INS and DEL events
Can now parse directory of genomes for easier use
Various bug fixes
v0.3.5
NEW RELEASE
Evolve outputs GFF with dNdS values per reference
Uses Prokka and Prodigal
Faster compute times
Updated help commands
Using Phi-D as proportionality metric
BUG FIXES
Update contig ID bug preventing contigs being output into strain genotypes
v0.3.4
- valid version string in the Cargo.toml (We didn't think we had the technology for this, but we did it)
v0.3.3a
Small update to variant calling. No longer filter out soft and hard clips using samclip.
v0.3.3
Updated to using Freebayes for SNP calling and SVIM for structural variant calling.
Added in guided variant calling algorithm to rescue low abudance variants.
Added in seeded fuzzy DBSCAN algorithm.
Updated some help messages, many flags still hidden for testing purposes.
0.3.2
Updated Lorikeet to use both short and long read variant callers: Snippy and SVIM
VCF files are now generated for each BAM, reads are used to phase variants between samples
0.2.9
Added experimental genotype method.
Updated help messages.
included extra flags:
include-supplementary
include-secondary
v0.2.5
First release of Lorikeet with current implemented modes:
Polymorph - Variant calling pipeline
Summarize - Summarize contig statistics
Evolve - Calculates dN/dS values of genes present in reference based on read mappings
May contain bugs