-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
GAMBLR OveRhaul #137
GAMBLR OveRhaul #137
Conversation
This PR needs someone else to test it. Everything in the recent testing script should now work automatically on a remote computer running GAMBLR. The only thing you should need to do is to adopt the Rprofile file that's in this PR so it gets loaded when you start R. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been tested both in Rstudio (on gphost08) as well as remotely in local Rstudio.
No errors were encountered when running a collection (recently added test_functions.R
+ extended examples that I have added to this script) of commonly used GAMBLR functions in Rstudio on gphost.
As for remote testing, The way I set this up was to sync all necessary files using the provided snakefile and following the instruction detailed in the main README
of GAMBLR. The package was tested in the same way as described above. The following warnings got returned after syncing and running check_gamblr_config
(to check for missing files):
[1] "MISSING FILES:"
[1] "~/gambl_results/gambl/DESeq2-0.0_salmon-1.0/mrna/vst-matrix-Hugo_Symbol_tidy.tsv" "~/gambl_results/all_the_things/cnv_master-1.0/merges/capture--grch37.seg" "~/gambl_results/all_the_things/cnv_master-1.0/merges/capture--hg38.seg"
[1] "DONE!"
The first missing file is a direct result of the changes I made when added support to get_gene_expression
(updated paths in config.yml
). check_gamblr_config
looks for this file in the wrong place and with the wrong extension. The file that got synced is the gzipped version of this file. For the other two missing files, can be considered benign warnings because the functions that supply CNV data via GAMBLR don't work for capture samples (clearly only one projection is specified in the snakefile as well, i.e the snakefile does what it is supposed to do).
There are a few checks in the test_remote.R
script that can probably be updated to avoid confusion related to the dimensions of some returned data frames. I would also suggest moving a lot of these testing functions to the recently added test_functions.R
script instead. They could be under their own header, "remote testing" or something along those lines. No need to do this at this point, I can add this to a future PR I am working on if wanted.
On to the testing. First, check_remote_configuration
seems to be causing an error, preventing multiple functions to run. I am assuming this function was added to ensure remote settings are correctly configured. However, the function in question invokes the stop argument even though Sys.setenv(R_CONFIG_ACTIVE = "remote")
is specified. I am thinking this is because the function is looking for the existence of R_CONFIG_ACTIVE
and expects the value for this to be "remote"
. There's a quick workaround to solve this, to be able to test remote functionalities. To bypass the halt caused by this function, I simply set R_CONFIG_ACTIVE = "remote"
, dropping the Sys.setenv()
part. Sys.setenv(R_CONFIG_ACTIVE = "remote")
does not seem to produce the desired outcome, at least not in my session.
After this, all examples in both testing scripts run and return expected outputs. With a few exemptions;
Function: get_manta_sv:
Error: '/Users/cmattsson/gambl_results/gambl/gamblr/02-merge/manta/genome/all_manta_merged_grch37.bedpe' does not exist.
Reason: This file is not being synced with the snakefile?
Function: get_cn_segments
Error: Failed to connect: Can't connect to local server through socket '/tmp/mysql.sock' (2)
Reason: Described in issue #136 on GAMBLR, this function runs with from_flatfile = FALSE
per default and is currently not working with the same parameter set to TRUE
(hot-fix will be included in an upcoming PR I am working on).
Overall, a very nice update with many useful improvements and not too many outstanding items that need to be addressed. However, maybe a second set of eyes would be useful, especially for remote testing. I'd be happy to clarify any comments I have if needed.
|
||
|
||
R_HISTSIZE=100000 | ||
#R_CONFIG_ACTIVE="remote" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out line 8 seems to work in my case, not line 9.
#' @export | ||
#' | ||
#' @examples | ||
check_remote_configuration = function(auto_connect=FALSE){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See PR review comment for more thoughts on this. In short, if I specify Sys.setenv(R_CONFIG_ACTIVE= "remote")
the function stops at line 48. But if I specify it as such, R_CONFIG_ACTIVE = "remote"
, the function seems to work and I can proceed with other functions that internally call this, e.g get_gambl_metadata.
.
@@ -2033,9 +2025,9 @@ get_ssm_by_region = function(chromosome, | |||
min_read_support = 3, | |||
mode = "slms-3", | |||
maf_columns = c("Chromosome", "Start_Position", "End_Position", "Tumor_Sample_Barcode", "t_alt_count"), | |||
maf_column_types = c("c","i","i","c","i"), | |||
ssh_session = NULL, | |||
maf_column_types = "ciici", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -2143,7 +2136,7 @@ get_ssm_by_region = function(chromosome, | |||
} | |||
#stop() | |||
muts = run_command_remote(ssh_session,tabix_command) | |||
muts_region = vroom::vroom(I(muts),col_types = paste(maf_column_types,collapse=""), | |||
muts_region = vroom::vroom(I(muts),col_types = maf_column_types, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean!
|
||
muts_region = read.table(textConnection(""), col.names = maf_columns,colClasses = maf_types_sep) | ||
}else{ | ||
muts_region = vroom::vroom(I(muts), col_types = paste(maf_column_types,collapse=""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this also be condensed to muts_region = vroom::vroom(I(muts),col_types = maf_column_types
as you did for line 2139 above?
@@ -2310,7 +2310,11 @@ collate_sv_results = function(sample_table, | |||
#' # install_github("morinlab/ggsci") | |||
#' | |||
get_gambl_colours = function(classification = "all", | |||
alpha = 1,as_list=FALSE){ | |||
alpha = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add descriptions of these new parameters so they can be properly documented? If not, I can do it in my oveRhaul.
#' | ||
#' @return | ||
#' @export | ||
#' | ||
#' @examples | ||
count_ssm_by_region = function(region,chromosome,start,end,all_mutations_in_these_regions,these_samples_metadata,count_by,seq_type="genome",ssh_session=NULL){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global parameter ssh_session = NULL
can probably also be dropped.
@@ -19,8 +19,8 @@ fancy_sv_sizedens( | |||
hide_legend = FALSE, | |||
chr_select = paste0("chr", c(1:22)), | |||
plot_title = paste0(this_sample), | |||
|
|||
plot_subtitle = paste0("SV sizes for Manta calls. Dashed line annotates mean variant size.\\nVAF cut off: ", | |||
plot_subtitle = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you use some kind of auto-formatting for this? Was not aware that this is the preferred syntax format.
Pull Request Checklists
Important: When opening a pull request, keep only the applicable checklist and delete all other sections.
Checklist for all PRs
Required
I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)
I ensured all dplyr functions that commonly conflict with other packages are fully qualified.
This can be checked and addressed by running
check_functions.pl
and responding to the prompts. Test your code after you do this.devtools::document()
) and addedNAMESPACE
and all other modified files in the root directory and underman
.Optional but preferred with PRs
Checklist for New Functions
Required
I documented my function using ROxygen style.)
All parameters for the function are described in the documentation and the function has a decriptive title.
Example:
import
statment.Example:
Checklist for changes to existing code
I added/removed arguments to a function and updated documentation for all changed/new arguments
I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)