-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add FDA metrics subworkflow to Immuno pipeline #78
Conversation
This includes the 07 December updates to that PR.
hi @tmooney . My one request for this PR is that we not leave all these FDA results at the top level of the results dir:
Instead can we place all of this under the existing i.e. one of these options
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only suggestion I have is to use the struct approach to better organize the outputs so that when we pull the results we get a tidier output results dir
Updated the outputs to build a struct named |
We're summing the values for all the keys, so the order shouldn't matter.
We really only need to do it once for each character instead of millions of times!
We know which sub-hash we're going to use in our update_hash calls, so we'll only pass those along instead.
We just need samtools + perl for this and we're not using all that much RAM to stream through the files.
Changes for efficiency of unaligned_seq_fda_stats perl script
Like for the unaligned stats, we can use less memory and a smaller Docker image here!
My tests with the updated version of this PR have succeeded. When I pull down the results, the FDA QC results appear nicely organized. Looks good to me! |
This incorporates some recent additions to the genome/analysis-workflows immuno.cwl pipeline into the WDL version here.