Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

new functions and some bug fixes #31

Merged
merged 16 commits into from
Oct 22, 2021
Merged

new functions and some bug fixes #31

merged 16 commits into from
Oct 22, 2021

Conversation

Kdreval
Copy link
Collaborator

@Kdreval Kdreval commented Sep 24, 2021

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Here, I have introduced function review_hotspots, which is used to ensure hotspots with known biological functions or in relevant genes are not lost and properly encoded. This function is also used in the get_coding_ssm_status if the user chooses to do so.

In addition, there is a new function complete_missing_from_matrix which is used very frequently in clustering. It ensures all samples are present in the matrix and orders samples consistently for different objects.

Also, I have added functionality to handle cases when the same column is specified both in numericMetadataColumns and metadataColumns when calling prettyOncoplot to avoid plotting numeric values as factors.

In addition, this PR includes the consistent ggplot theme Morons, used in the plots of BL manuscript figure.

Required

  • I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)
> GAMBLR::get_ssm_by_region(region="chr2:217047585-217047838",streamlined = TRUE,
+                           from_indexed_flatfile=TRUE, mode="slms-3")
  Start_Position Tumor_Sample_Barcode
1      217047696                  BL2
> GAMBLR::get_ssm_by_region(region="chr2:217047585-217047838",streamlined = TRUE,
+                           from_indexed_flatfile=TRUE, mode="strelka2")
# A tibble: 3 x 2
  Start_Position Tumor_Sample_Barcode
           <dbl> <chr>               
1      217047640 /this_sample_id_is_redacted_here/
2      217047640 /this_sample_id_is_redacted_here/
3      217047696 BL2                 
> GAMBLR::get_ssm_by_region(region="chr2:217047585-217047838",streamlined = TRUE,
+                           from_indexed_flatfile=FALSE)
  Start_Position Tumor_Sample_Barcode
1      217047696                  BL2
# review_hotspots function
ssm <- get_coding_ssm(from_flatfile=TRUE)
annotated <- annotate_hotspots(ssm, recurrence_min = 25)
annotated %>% dplyr::filter(Hugo_Symbol %in% c("FOXO1", "MYD88", "CREBBP")) %>%
  dplyr::select(Hugo_Symbol, hot_spot) %>% table
           hot_spot
Hugo_Symbol TRUE
     CREBBP   87
     FOXO1    96
     MYD88     0

test <- review_hotspots(annotated, genes_of_interest = c("FOXO1", "MYD88", "CREBBP"), genome_build = "hg19")
test %>% dplyr::filter(Hugo_Symbol %in% c("FOXO1", "MYD88", "CREBBP")) %>%
  dplyr::select(Hugo_Symbol, hot_spot) %>% table
           hot_spot
Hugo_Symbol TRUE
     CREBBP  149
     FOXO1   104
     MYD88    38
# get_coding_ssm_status
test <- GAMBLR::get_coding_ssm_status(include_hotspots = FALSE)
test2 <- GAMBLR::get_coding_ssm_status(include_hotspots = TRUE, recurrence_min = 50)
> sum(test[,"FOXO1"])
[1] 137
> sum(test2[,"FOXO1"])
[1] 37
> sum(test2[,"FOXO1HOTSPOT"])
[1] 100
ggplot(mpg, aes(displ, hwy, colour = class)) +
    geom_point() +
    theme_Morons()
  • I ensured all dplyr functions that commonly conflict with other packages are fully qualified.

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

  • I generated the documentation and checked for errors relating to the new function (e.g. devtools::document()) and added NAMESPACE and all other modified files in the root directory and under man.

Checklist for New Functions

Required

  • I documented my function using ROxygen style.)

  • All parameters for the function are described in the documentation and the function has a decriptive title.

Example:

#' Use GISTIC2.0 scores output to reproduce maftools::chromoplot with more flexibility
#'
#' @param scores output file scores.gistic from the run of GISTIC2.0
#' @param genes_to_label optional. Provide a data frame of genes to label (if mutated). The first 3 columns must contain chromosome, start, and end coordinates. Another required column must contain gene names and be named `gene`. (truncated for example)
#' @param cutoff optional. Used to determine which regions to color as aberrant. Must be float in the range [0-1]. (truncated for example)

Checklist for changes to existing code

  • I added/removed arguments to a function and updated documentation for all changed/new arguments

  • I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)

@Kdreval
Copy link
Collaborator Author

Kdreval commented Oct 15, 2021

this includes the new functionality of strelka2 flatfiles and is ready for review

@rdmorin rdmorin merged commit cb14435 into master Oct 22, 2021
@rdmorin rdmorin deleted the kdreval-dev branch October 22, 2021 18:53
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants