Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

TSO500 ctDNA v2 support #135

Merged
merged 20 commits into from
Oct 4, 2024
Merged

TSO500 ctDNA v2 support #135

merged 20 commits into from
Oct 4, 2024

Conversation

pdiakumis
Copy link
Member

Adding initial support for cttso v2 outputs.

  • All outputs from Results/ are parsed except SmallVariants_Annotated.json.gz (TODO).
    • We continue to ignore the MetricsOutput.tsv since it is terribly inconsistent for programmatic parsing. Its contents are parsed from other files instead.
  • The SampleAnalysisResults.json under Logs_Intermediates/SampleAnalysisResults/ is also parsed, since there is no copy under Results/.
  • For initial exploration purposes I'm just downloading several files from the Logs_Intermediates/DragenCaller folder, which will be parsed using a different class (tracked via Add dragen R6 subclass #134).

Example workflow:

p <- file.path(
"s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production",
"analysis/cttsov2/20240915ff0295ed"
)
LibraryID <- "L2401290"
outdir <- sub("s3:/", "~/s3", p)
t2 <- Wf_tso_ctdna_tumor_only_v2$new(path = p, LibraryID = LibraryID)
t2$list_files_filter_relevant(max_files = 500)
# A tibble: 55 × 5
   type          bname                                           size lastmodified        path                                                                                                                                                                                          
   <chr>         <chr>                                    <fs::bytes> <dttm>              <glue>                                                                                                                                                                                        
 1 DOWNLOAD_ONLY Metrics_L2401290.json                            180 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/AdditionalSarjMetrics/Metrics_L2401290.json                   
 2 DOWNLOAD_ONLY L2401290.contamination.json                    2.62K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Contamination/L2401290/L2401290.contamination.json            
 3 DOWNLOAD_ONLY L2401290-replay.json                          58.98K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290-replay.json                    
 4 DOWNLOAD_ONLY L2401290.cnv_metrics.csv                         845 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.cnv_metrics.csv                
 5 DOWNLOAD_ONLY L2401290.exon_contig_mean_cov.csv              1.47K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.exon_contig_mean_cov.csv       
 6 DOWNLOAD_ONLY L2401290.exon_coverage_metrics.csv              2.5K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.exon_coverage_metrics.csv      
 7 DOWNLOAD_ONLY L2401290.exon_fine_hist.csv                   15.25K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.exon_fine_hist.csv             
 8 DOWNLOAD_ONLY L2401290.exon_hist.csv                         1.23K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.exon_hist.csv                  
 9 DOWNLOAD_ONLY L2401290.exon_overall_mean_cov.csv               109 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.exon_overall_mean_cov.csv      
10 DOWNLOAD_ONLY L2401290.fastqc_metrics.csv                  398.08K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.fastqc_metrics.csv             
11 DOWNLOAD_ONLY L2401290.fragment_length_hist.csv            294.23K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.fragment_length_hist.csv       
12 DOWNLOAD_ONLY L2401290.gc_metrics.csv                       10.52K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.gc_metrics.csv                 
13 DOWNLOAD_ONLY L2401290.gvcf_metrics.csv                      1.56K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.gvcf_metrics.csv               
14 DOWNLOAD_ONLY L2401290.mapping_metrics.csv                   7.54K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.mapping_metrics.csv            
15 DOWNLOAD_ONLY L2401290.microsat_diffs.txt                   76.55K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.microsat_diffs.txt             
16 DOWNLOAD_ONLY L2401290.microsat_output.json                  1.04K 2024-09-15 05:02:35 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.microsat_output.json           
17 DOWNLOAD_ONLY L2401290.sv_metrics.csv                          299 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.sv_metrics.csv                 
18 DOWNLOAD_ONLY L2401290.target_bed_contig_mean_cov.csv        1.48K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.target_bed_contig_mean_cov.csv 
19 DOWNLOAD_ONLY L2401290.target_bed_coverage_metrics.csv       2.34K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.target_bed_coverage_metrics.csv
20 DOWNLOAD_ONLY L2401290.target_bed_fine_hist.csv             16.41K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.target_bed_fine_hist.csv       
21 DOWNLOAD_ONLY L2401290.target_bed_hist.csv                     635 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.target_bed_hist.csv            
22 DOWNLOAD_ONLY L2401290.target_bed_overall_mean_cov.csv          52 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.target_bed_overall_mean_cov.csv
23 DOWNLOAD_ONLY L2401290.time_metrics.csv                        948 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.time_metrics.csv               
24 DOWNLOAD_ONLY L2401290.tmb_contig_mean_cov.csv               1.47K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.tmb_contig_mean_cov.csv        
25 DOWNLOAD_ONLY L2401290.tmb_coverage_metrics.csv               2.5K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.tmb_coverage_metrics.csv       
26 DOWNLOAD_ONLY L2401290.tmb_fine_hist.csv                    14.59K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.tmb_fine_hist.csv              
27 DOWNLOAD_ONLY L2401290.tmb_hist.csv                          1.29K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.tmb_hist.csv                   
28 DOWNLOAD_ONLY L2401290.tmb_overall_mean_cov.csv                114 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.tmb_overall_mean_cov.csv       
29 DOWNLOAD_ONLY L2401290.trimmer_metrics.csv                   1.35K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.trimmer_metrics.csv            
30 DOWNLOAD_ONLY L2401290.umi_metrics.csv                        2.9K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.umi_metrics.csv                
31 DOWNLOAD_ONLY L2401290.vc_metrics.csv                        1.56K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.vc_metrics.csv                 
32 DOWNLOAD_ONLY L2401290.wgs_contig_mean_cov.csv               2.15K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.wgs_contig_mean_cov.csv        
33 DOWNLOAD_ONLY L2401290.wgs_coverage_metrics.csv              2.11K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.wgs_coverage_metrics.csv       
34 DOWNLOAD_ONLY L2401290.wgs_fine_hist.csv                    18.16K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.wgs_fine_hist.csv              
35 DOWNLOAD_ONLY L2401290.wgs_hist.csv                            560 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.wgs_hist.csv                   
36 DOWNLOAD_ONLY L2401290.wgs_overall_mean_cov.csv                 42 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/DragenCaller/L2401290/L2401290.wgs_overall_mean_cov.csv       
37 read_sar      L2401290_SampleAnalysisResults.json            1.66M 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/SampleAnalysisResults/L2401290_SampleAnalysisResults.json     
38 DOWNLOAD_ONLY L2401290-replay.json                          48.87K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Tmb/L2401290/L2401290-replay.json                             
39 DOWNLOAD_ONLY L2401290.hard-filtered.vcf                     8.32M 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Tmb/L2401290/L2401290.hard-filtered.vcf                       
40 DOWNLOAD_ONLY L2401290.time_metrics.csv                         93 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Tmb/L2401290/L2401290.time_metrics.csv                        
41 DOWNLOAD_ONLY L2401290.tmb.metrics.csv                         278 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Tmb/L2401290/L2401290.tmb.metrics.csv                         
42 DOWNLOAD_ONLY L2401290.tmb.msaf.csv                            576 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Tmb/L2401290/L2401290.tmb.msaf.csv                            
43 DOWNLOAD_ONLY L2401290.tmb.trace.tsv                        263.4K 2024-09-15 05:02:36 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/Tmb/L2401290/L2401290.tmb.trace.tsv                           
44 read_cnv      L2401290.cnv.vcf.gz                             4.3K 2024-09-15 05:05:28 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.cnv.vcf.gz                                             
45 DOWNLOAD_ONLY L2401290.cnv.vcf.gz.tbi                        3.25K 2024-09-15 05:05:31 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.cnv.vcf.gz.tbi                                         
46 read_cvgrepe  L2401290.exon_cov_report.tsv                 408.35K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.exon_cov_report.tsv                                    
47 read_cvgrepg  L2401290.gene_cov_report.tsv                  23.98K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.gene_cov_report.tsv                                    
48 read_hardfilt L2401290.hard-filtered.vcf.gz                  1.71M 2024-09-15 05:05:27 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.hard-filtered.vcf.gz                                   
49 DOWNLOAD_ONLY L2401290.hard-filtered.vcf.gz.tbi             29.37K 2024-09-15 05:05:30 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.hard-filtered.vcf.gz.tbi                               
50 read_msi      L2401290.microsat_output.json                  1.04K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.microsat_output.json                                   
51 read_tmbt     L2401290.tmb.trace.tsv                        263.4K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290.tmb.trace.tsv                                          
52 read_cvo      L2401290_CombinedVariantOutput.tsv           122.94K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290_CombinedVariantOutput.tsv                              
53 read_fus      L2401290_Fusions.csv                           1.18K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290_Fusions.csv                                            
54 DOWNLOAD_ONLY L2401290_MetricsOutput.tsv                     2.25K 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290_MetricsOutput.tsv                                      
55 DOWNLOAD_ONLY L2401290_SmallVariants_Annotated.json.gz      14.44M 2024-09-15 05:02:37 s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Results/L2401290/L2401290_SmallVariants_Annotated.json.gz
d <- t2$download_files(
  outdir = outdir,
  max_files = 500,
  dryrun = F
)
d_tidy <- t2$tidy_files(d)
d_tidy
# A tibble: 14 × 2
   name            data                  
   <chr>           <list>                
 1 sar_sampleinfo  <tibble [1 × 5]>      
 2 sar_qc          <tibble [1 × 30]>     
 3 sar_swconfds    <tibble [7 × 4]>      
 4 sar_swconfother <tibble [1 × 8]>      
 5 sar_snv         <tibble [1,389 × 25]> 
 6 sar_cnv         <tibble [17 × 7]>     
 7 cnv             <tibble [59 × 18]>    
 8 cvgrepe         <tibble [7,557 × 8]>  
 9 cvgrepg         <tibble [519 × 7]>    
10 hardfilt        <tibble [40,371 × 27]>
11 msi             <tibble [1 × 6]>      
12 tmbtrace        <tibble [1,389 × 27]> 
13 combinedvaro    <tibble [1,389 × 11]> 
14 fusions         <tibble [0 × 18]> 
t2$write(
  d_tidy,
  outdir = file.path(outdir, "dracarys_tidy"),
  prefix = LibraryID,
  format = "rds"
)

@pdiakumis
Copy link
Member Author

I looked in most of the files under Logs_Intermediates/:

aws s3 ls s3://pipeline-prod-cache-503977275616-ap-southeast-2/byob-icav2/production/analysis/cttsov2/20240915ff0295ed/Logs_Intermediates/                                                                <aws:umccr-pro>
                           PRE AdditionalSarjMetrics/
                           PRE Annotation/
                           PRE CombinedVariantOutput/
                           PRE Contamination/
                           PRE CoverageReports/
                           PRE DnaFusionFiltering/
                           PRE DragenCaller/
                           PRE FastqValidation/
                           PRE MetricsOutput/
                           PRE PassingSampleSteps/
                           PRE ResourceVerification/
                           PRE SampleAnalysisResults/
                           PRE SampleSheetValidation/
                           PRE Tmb/
                           PRE nextflow_work_logs/
2024-09-15 15:02:37        481 passing_sample_steps.json
  • AdditionalSarjMetrics: nothing of value 💩
  • Annotation: SmallVariants_Annotated.json.gz is in Results
  • CombinedVariantOutput: CombinedVariantOutput.tsv is in Results
  • Contamination: most important stuff in SAR JSON (this also includes SNPs used for contamination estimation) ✅
  • CoverageReports: both exon and gene cov_report.tsv are in Results
  • DnaFusionFiltering: in Results
  • 💙 DragenCaller: this has most of the useful stuff 💙
  • FastqValidation: just a log file 💩
  • MetricsOutput: 💩
  • PassingSampleSteps: 💩
  • ResourceVerification: 💩
  • 💙 SampleAnalysisResults: grabbing the SampleAnalysisResults.json from here 💙 ✅
  • SampleSheetValidation: 💩
  • Tmb: most important stuff is in the SAR JSON
    • 💙 Need to compare TMB_Annotated.json.gz with other JSONs 💙
  • nextflow_work_logs: 💩

@pdiakumis pdiakumis merged commit 86730fc into main Oct 4, 2024
1 check passed
@pdiakumis pdiakumis deleted the cttsov2 branch October 4, 2024 05:14
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant