Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add FDA metrics subworkflow to Immuno pipeline #78

Merged
merged 19 commits into from
Mar 9, 2023

Conversation

tmooney
Copy link
Member

@tmooney tmooney commented Dec 19, 2022

This incorporates some recent additions to the genome/analysis-workflows immuno.cwl pipeline into the WDL version here.

@malachig
Copy link
Member

hi @tmooney . My one request for this PR is that we not leave all these FDA results at the top level of the results dir:

  • aligned_normal_dna_fda_metrics
  • aligned_tumor_dna_fda_metrics
  • aligned_tumor_rna_fda_metrics
  • unaligned_normal_dna_fda_metrics
  • unaligned_tumor_dna_fda_metrics
  • unaligned_tumor_rna_fda_metrics

Instead can we place all of this under the existing qc or in a dedicated subdir at the toplevel.

i.e. one of these options

  • qc/fda_metrics/
  • fda_metrics/

@malachig malachig self-assigned this Feb 14, 2023
@malachig malachig self-requested a review February 14, 2023 16:07
Copy link
Member

@malachig malachig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only suggestion I have is to use the struct approach to better organize the outputs so that when we pull the results we get a tidier output results dir

@tmooney
Copy link
Member Author

tmooney commented Feb 16, 2023

Updated the outputs to build a struct named fda_metrics to group these metrics together. (Pushed the commit yesterday; a test run completed overnight.)

We're summing the values for all the keys, so the order shouldn't
matter.
We really only need to do it once for each character instead of millions
of times!
We know which sub-hash we're going to use in our update_hash calls, so
we'll only pass those along instead.
We just need samtools + perl for this and we're not using all that much
RAM to stream through the files.
Changes for efficiency of unaligned_seq_fda_stats perl script
Like for the unaligned stats, we can use less memory and a smaller
Docker image here!
@malachig
Copy link
Member

malachig commented Mar 9, 2023

My tests with the updated version of this PR have succeeded. When I pull down the results, the FDA QC results appear nicely organized. Looks good to me!

@malachig malachig merged commit 4d3de66 into wustl-oncology:main Mar 9, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants