Revised genotyping
So, in keeping with tradition this release brings a bunch of changes to Lorikeet that make it pretty distant from where it was a month ago. I know only a few people are trying to keep track of all changes that keep being made here, and I'm sorry things are so stochastic. I think the words of my supervisor put it best when I told him about one of the changes I had made... "Ah, so freebayes is out this week, huh?"
Yeah, freebayes is out. Cancelled. For generating illegal instructions
and segmentation fault
on GPU nodes. I ain't fixing that, I'll just make my own variant caller.
Lorikeet's new best friends are UMAP and HDBSCAN. The curse of dimensionality hexed me pretty good during benchmarking, so UMAP is being used for dimensionality reduction. I chose it over PCA since it seems to discriminate grouping of variants way better. Also, since we now have to use a python library for UMAP, might as well upgrade fuzzy DBSCAN to it's better version: HDBSCAN
Changes:
- Freebayes. OUT.
- Fuzzy DBSCAN. OUT.
- UMAP. IN.
- HDBSCAN. IN.
- Evolve now reports per sample dNdS and coverage values for each ORF