Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add function for gridss svs #47

Merged
merged 3 commits into from
Jan 8, 2022
Merged

add function for gridss svs #47

merged 3 commits into from
Jan 8, 2022

Conversation

rdmorin
Copy link
Collaborator

@rdmorin rdmorin commented Jan 8, 2022

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Required

  • I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)

  • I ensured all dplyr functions that commonly conflict with other packages are fully qualified.

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

  • I generated the documentation and checked for errors relating to the new function (e.g. devtools::document()) and added NAMESPACE and all other modified files in the root directory and under man.

Optional but preferred with PRs

  • I updated and/or successfully knitted a vignette that relies on the modified code (which ones?)

Checklist for New Functions

Required

  • I documented my function using ROxygen style.)

  • All parameters for the function are described in the documentation and the function has a decriptive title.

Example:

#' Use GISTIC2.0 scores output to reproduce maftools::chromoplot with more flexibility
#'
#' @param scores output file scores.gistic from the run of GISTIC2.0
#' @param genes_to_label optional. Provide a data frame of genes to label (if mutated). The first 3 columns must contain chromosome, start, and end coordinates. Another required column must contain gene names and be named `gene`. (truncated for example)
#' @param cutoff optional. Used to determine which regions to color as aberrant. Must be float in the range [0-1]. (truncated for example)
  • My function uses a library that isn't already a dependency of GAMBLR and I made the package aware of this dependency using the function documentation import statment.

Example:

#' @return nothing
#' @export
#' @import tidyverse ggrepel

Checklist for changes to existing code

  • I added/removed arguments to a function and updated documentation for all changed/new arguments

  • I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)

> laura_sv = get_combined_sv()
                                                                                                                                                                                                                          
> dim(laura_sv)
[1] 84430    19
> head(laura_sv)
# A tibble: 6 × 19
  CHROM_A  START_A    END_A CHROM_B  START_B    END_B manta_name SOMATIC_SCORE STRAND_A STRAND_B tumour_sample_id normal_sample_id VAF_tumour    DP gridss_name ANNOTATION_A DIST_TO_ANNOTAT…
  <chr>      <dbl>    <dbl> <chr>      <dbl>    <dbl> <chr>              <dbl> <chr>    <chr>    <chr>            <chr>                 <dbl> <dbl> <chr>       <chr>                   <dbl>
1 1        9644469  9644474 1       44520462 44520467 NA                  236. -        +        00-14595_tumorA  00-14595_normal       0.115   226 gridss0bf_… SE_CLSTN1,S…            41034
2 1        9947762  9947763 1       16951791 16951792 NA                  225. -        -        00-14595_tumorA  00-14595_normal       0.127   338 gridss0bb_… SE_AL357140…           -21594

@rdmorin rdmorin requested a review from lkhilton January 8, 2022 17:02
#' @param min_vaf The minimum tumour VAF for a SV to be returned
#' @param min_score The lowest Manta somatic score for a SV to be returned
#' @param with_chr_prefix Prepend all chromosome names with chr (required by some downstream analyses)
#' @param min_vaf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 params seems to be redundant with the ones above and projection is missing, but I think it can stay like this for now. I can take care of this after marging

dplyr::filter(VAF_tumour >= min_vaf & SOMATIC_SCORE >= min_score)

if(!missing(this_sample_id)){
all_sv = all_sv %>% dplyr::filter(tumour_sample_id == sample_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it work for a list of ids? Should we switch for %in% here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. If you do that you should probably rename the parameter these_sample_ids. I don't really think this parameter will get used much anyway. The user can always apply that filter after calling the function so it's a bit redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I see, thanks!

))

}
return(all_sv)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we can call liftover_bedpe here to support the hg38 coordinates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes, but it would be super slow on this many records. I would prefer to have this pre-computed on everything so we can just load the merge of either reference instead of any lifting over on-the-fly. Since Laura prepared these files it would be up to her to decide if she wants to write code to lift things in the opposite direction or someone else could extend her code once it's in a PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense, for sure. If Laura doesn't have time I can work on this. I think we can merge this as is, and add the hg38 in the future update

@Kdreval Kdreval merged commit f16af7f into master Jan 8, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants