Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Umccrise tidy #127

Merged
merged 24 commits into from
Sep 11, 2024
Merged

Umccrise tidy #127

merged 24 commits into from
Sep 11, 2024

Conversation

pdiakumis
Copy link
Member

Quite a few enhancements/changes.

  • Complete refactor of the dracarys R6 class system. We now have a Wf super class which is the base for Wf_* subclasses.
  • Setting up a Wf_umccrise subclass
  • S3 function refactor. Now using {paws}/{paws.storage} for everything.
  • Adding a umccrise summary reporter

Wf

This class has:

  • fields:

    • path: a directory with raw workflow results (can be GDS, S3 or local filesystem)
    • wname: workflow name (e.g. umccrise, sash)
    • filesystem: gds, s3, local
    • regexes: a tibble with file regex and function to parse it
  • methods:

    • list_files: List all files under given path.
    • list_files_filter_relevant: List 'dracarys' files under given path
    • download_files: Download files from GDS/S3 to local filesystem (for debugging/exploration)
    • tidy_files: Tidy/process files
    • write: Write tidy tables to database/parquet/tsv/rds etc.

A neat trick it utilises under the hood is eval(parse(text = f), envir = self), where self allows it to use parsers from each individual subclass, which is pretty tricky to get working properly.

Wf_umccrise

  • Parses the files within cancer_report_tables/ (signatures for SNV/DBS/Indel, HRD from CHORD/HRDetect, QC summary), conpair/, and the PCGR JSON (for MSI).
  • Requires SubjectID/SampleID_tumor in order to more reliably fish out results from the final and work directories (via list_files_filter_relevant).
  • Wf_umccrise_download_tidy_write is a neat wrapper that does what it says

S3

  • s3_file_presignedurl: uses generate_presigned_url from paws.storage. The client needs to have a v4 signature version (paws.storage::s3(paws.storage::config(signature_version = "s3v4"))).

umccrise summary

  • Using Quarto and Reactable.
  • Includes:
    • sample metadata
    • summary QC metrics
    • HRD plot (CHORD vs. HRDetect)
    • signatures:
      • table with all of them (for old/new SNVs, DBS, Indels)
      • table with top 2 ranks
      • barplot with top 3 ranks for SNV 2015
    • metadata summary
      • project name/owner
      • source/quality
      • workflow

@pdiakumis pdiakumis merged commit 4680506 into main Sep 11, 2024
1 check passed
@pdiakumis pdiakumis deleted the umccrise_tidy branch September 11, 2024 03:45
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant