Add FDA metrics subworkflow to Immuno pipeline #78

tmooney · 2022-12-19T22:06:59Z

This incorporates some recent additions to the genome/analysis-workflows immuno.cwl pipeline into the WDL version here.

This includes the 07 December updates to that PR.

malachig · 2023-02-14T15:37:05Z

hi @tmooney . My one request for this PR is that we not leave all these FDA results at the top level of the results dir:

aligned_normal_dna_fda_metrics
aligned_tumor_dna_fda_metrics
aligned_tumor_rna_fda_metrics
unaligned_normal_dna_fda_metrics
unaligned_tumor_dna_fda_metrics
unaligned_tumor_rna_fda_metrics

Instead can we place all of this under the existing qc or in a dedicated subdir at the toplevel.

i.e. one of these options

qc/fda_metrics/
fda_metrics/

malachig

The only suggestion I have is to use the struct approach to better organize the outputs so that when we pull the results we get a tidier output results dir

tmooney · 2023-02-16T16:18:10Z

Updated the outputs to build a struct named fda_metrics to group these metrics together. (Pushed the commit yesterday; a test run completed overnight.)

We're summing the values for all the keys, so the order shouldn't matter.

We really only need to do it once for each character instead of millions of times!

We know which sub-hash we're going to use in our update_hash calls, so we'll only pass those along instead.

We just need samtools + perl for this and we're not using all that much RAM to stream through the files.

Changes for efficiency of unaligned_seq_fda_stats perl script

Like for the unaligned stats, we can use less memory and a smaller Docker image here!

malachig · 2023-03-09T16:23:55Z

My tests with the updated version of this PR have succeeded. When I pull down the results, the FDA QC results appear nicely organized. Looks good to me!

tmooney added 8 commits December 13, 2022 14:44

Add fastqc tool.

4409bb5

Add md5sum tool.

be405b8

Add tool to convert CRAM to BAM.

4e3b417

Add tools for generating FDA stats.

207354e

Add tool for making tables for FDA stats.

32b43ec

Workflows for generating FDA metrics.

d22ec00

Add FDA metric generation to immuno pipeline.

9044ddd

Updates to incorporate recent changes in genome/analysis-workflows#1077

db1499a

This includes the 07 December updates to that PR.

malachig self-assigned this Feb 14, 2023

malachig self-requested a review February 14, 2023 16:07

malachig approved these changes Feb 14, 2023

View reviewed changes

Group all the FDA metrics outputs together.

f890374

tmooney added 10 commits February 20, 2023 16:59

Remove sort.

40b05b9

We're summing the values for all the keys, so the order shouldn't matter.

Remove unused "count_base".

f0a50b1

Calculate length once instead of twice.

6f3fc7e

Remove an extra unpack per base by deferring score ASCII conversion.

c5d4dcb

We really only need to do it once for each character instead of millions of times!

Reduce repeated hashing calls on $path.

e6ee108

We know which sub-hash we're going to use in our update_hash calls, so we'll only pass those along instead.

Reduce resource requirements.

85811e5

We just need samtools + perl for this and we're not using all that much RAM to stream through the files.

Merge pull request #1 from tmooney/unaligned_seq_patch

c22e032

Changes for efficiency of unaligned_seq_fda_stats perl script

Remove obsolete comments.

06e7677

Reduce resource request for aligned FDA metrics stats.

8759e98

Like for the unaligned stats, we can use less memory and a smaller Docker image here!

Reduce memory request for md5sum.

09c3e42

tmooney mentioned this pull request Mar 7, 2023

Create QC Metrics output type. #86

Merged

malachig merged commit 4d3de66 into wustl-oncology:main Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FDA metrics subworkflow to Immuno pipeline #78

Add FDA metrics subworkflow to Immuno pipeline #78

tmooney commented Dec 19, 2022

malachig commented Feb 14, 2023

malachig left a comment

tmooney commented Feb 16, 2023

malachig commented Mar 9, 2023

Add FDA metrics subworkflow to Immuno pipeline #78

Add FDA metrics subworkflow to Immuno pipeline #78

Conversation

tmooney commented Dec 19, 2022

malachig commented Feb 14, 2023

malachig left a comment

Choose a reason for hiding this comment

tmooney commented Feb 16, 2023

malachig commented Mar 9, 2023