Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GAMBLR OveRhaul #137

Merged
merged 13 commits into from
Nov 17, 2022
Merged

GAMBLR OveRhaul #137

merged 13 commits into from
Nov 17, 2022

Conversation

rdmorin
Copy link
Collaborator

@rdmorin rdmorin commented Oct 25, 2022

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for all PRs

Required

  • I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)

  • I ensured all dplyr functions that commonly conflict with other packages are fully qualified.

This can be checked and addressed by running check_functions.pl and responding to the prompts. Test your code after you do this.

  • I generated the documentation and checked for errors relating to the new function (e.g. devtools::document()) and added NAMESPACE and all other modified files in the root directory and under man.

Optional but preferred with PRs

  • I updated and/or successfully knitted a vignette that relies on the modified code (which ones?)

Checklist for New Functions

Required

  • I documented my function using ROxygen style.)

  • All parameters for the function are described in the documentation and the function has a decriptive title.

Example:

#' Use GISTIC2.0 scores output to reproduce maftools::chromoplot with more flexibility
#'
#' @param scores output file scores.gistic from the run of GISTIC2.0
#' @param genes_to_label optional. Provide a data frame of genes to label (if mutated). The first 3 columns must contain chromosome, start, and end coordinates. Another required column must contain gene names and be named `gene`. (truncated for example)
#' @param cutoff optional. Used to determine which regions to color as aberrant. Must be float in the range [0-1]. (truncated for example)
  • My function uses a library that isn't already a dependency of GAMBLR and I made the package aware of this dependency using the function documentation import statment.

Example:

#' @return nothing
#' @export
#' @import tidyverse ggrepel

Checklist for changes to existing code

  • I added/removed arguments to a function and updated documentation for all changed/new arguments

  • I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)

@rdmorin
Copy link
Collaborator Author

rdmorin commented Oct 25, 2022

This PR needs someone else to test it. Everything in the recent testing script should now work automatically on a remote computer running GAMBLR. The only thing you should need to do is to adopt the Rprofile file that's in this PR so it gets loaded when you start R.

@rdmorin rdmorin changed the title revert for loop for remote functionality GAMBLR OveRhaul Oct 25, 2022
Copy link
Contributor

@mattssca mattssca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been tested both in Rstudio (on gphost08) as well as remotely in local Rstudio.

No errors were encountered when running a collection (recently added test_functions.R + extended examples that I have added to this script) of commonly used GAMBLR functions in Rstudio on gphost.

As for remote testing, The way I set this up was to sync all necessary files using the provided snakefile and following the instruction detailed in the main README of GAMBLR. The package was tested in the same way as described above. The following warnings got returned after syncing and running check_gamblr_config (to check for missing files):

[1] "MISSING FILES:"
[1] "~/gambl_results/gambl/DESeq2-0.0_salmon-1.0/mrna/vst-matrix-Hugo_Symbol_tidy.tsv" "~/gambl_results/all_the_things/cnv_master-1.0/merges/capture--grch37.seg"         "~/gambl_results/all_the_things/cnv_master-1.0/merges/capture--hg38.seg"          
[1] "DONE!"

The first missing file is a direct result of the changes I made when added support to get_gene_expression (updated paths in config.yml). check_gamblr_config looks for this file in the wrong place and with the wrong extension. The file that got synced is the gzipped version of this file. For the other two missing files, can be considered benign warnings because the functions that supply CNV data via GAMBLR don't work for capture samples (clearly only one projection is specified in the snakefile as well, i.e the snakefile does what it is supposed to do).

There are a few checks in the test_remote.R script that can probably be updated to avoid confusion related to the dimensions of some returned data frames. I would also suggest moving a lot of these testing functions to the recently added test_functions.R script instead. They could be under their own header, "remote testing" or something along those lines. No need to do this at this point, I can add this to a future PR I am working on if wanted.

On to the testing. First, check_remote_configuration seems to be causing an error, preventing multiple functions to run. I am assuming this function was added to ensure remote settings are correctly configured. However, the function in question invokes the stop argument even though Sys.setenv(R_CONFIG_ACTIVE = "remote") is specified. I am thinking this is because the function is looking for the existence of R_CONFIG_ACTIVE and expects the value for this to be "remote". There's a quick workaround to solve this, to be able to test remote functionalities. To bypass the halt caused by this function, I simply set R_CONFIG_ACTIVE = "remote", dropping the Sys.setenv() part. Sys.setenv(R_CONFIG_ACTIVE = "remote") does not seem to produce the desired outcome, at least not in my session.

After this, all examples in both testing scripts run and return expected outputs. With a few exemptions;

Function: get_manta_sv:
Error: '/Users/cmattsson/gambl_results/gambl/gamblr/02-merge/manta/genome/all_manta_merged_grch37.bedpe' does not exist.
Reason: This file is not being synced with the snakefile?

Function: get_cn_segments
Error: Failed to connect: Can't connect to local server through socket '/tmp/mysql.sock' (2)
Reason: Described in issue #136 on GAMBLR, this function runs with from_flatfile = FALSE per default and is currently not working with the same parameter set to TRUE (hot-fix will be included in an upcoming PR I am working on).

Overall, a very nice update with many useful improvements and not too many outstanding items that need to be addressed. However, maybe a second set of eyes would be useful, especially for remote testing. I'd be happy to clarify any comments I have if needed.



R_HISTSIZE=100000
#R_CONFIG_ACTIVE="remote"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out line 8 seems to work in my case, not line 9.

#' @export
#'
#' @examples
check_remote_configuration = function(auto_connect=FALSE){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR review comment for more thoughts on this. In short, if I specify Sys.setenv(R_CONFIG_ACTIVE= "remote") the function stops at line 48. But if I specify it as such, R_CONFIG_ACTIVE = "remote", the function seems to work and I can proceed with other functions that internally call this, e.g get_gambl_metadata..

@@ -2033,9 +2025,9 @@ get_ssm_by_region = function(chromosome,
min_read_support = 3,
mode = "slms-3",
maf_columns = c("Chromosome", "Start_Position", "End_Position", "Tumor_Sample_Barcode", "t_alt_count"),
maf_column_types = c("c","i","i","c","i"),
ssh_session = NULL,
maf_column_types = "ciici",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -2143,7 +2136,7 @@ get_ssm_by_region = function(chromosome,
}
#stop()
muts = run_command_remote(ssh_session,tabix_command)
muts_region = vroom::vroom(I(muts),col_types = paste(maf_column_types,collapse=""),
muts_region = vroom::vroom(I(muts),col_types = maf_column_types,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean!


muts_region = read.table(textConnection(""), col.names = maf_columns,colClasses = maf_types_sep)
}else{
muts_region = vroom::vroom(I(muts), col_types = paste(maf_column_types,collapse=""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be condensed to muts_region = vroom::vroom(I(muts),col_types = maf_column_types as you did for line 2139 above?

@@ -2310,7 +2310,11 @@ collate_sv_results = function(sample_table,
#' # install_github("morinlab/ggsci")
#'
get_gambl_colours = function(classification = "all",
alpha = 1,as_list=FALSE){
alpha = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add descriptions of these new parameters so they can be properly documented? If not, I can do it in my oveRhaul.

#'
#' @return
#' @export
#'
#' @examples
count_ssm_by_region = function(region,chromosome,start,end,all_mutations_in_these_regions,these_samples_metadata,count_by,seq_type="genome",ssh_session=NULL){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global parameter ssh_session = NULL can probably also be dropped.

@@ -19,8 +19,8 @@ fancy_sv_sizedens(
hide_legend = FALSE,
chr_select = paste0("chr", c(1:22)),
plot_title = paste0(this_sample),

plot_subtitle = paste0("SV sizes for Manta calls. Dashed line annotates mean variant size.\\nVAF cut off: ",
plot_subtitle =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you use some kind of auto-formatting for this? Was not aware that this is the preferred syntax format.

@Kdreval Kdreval merged commit 57f201f into master Nov 17, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants