-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Rmorin dev #21
Rmorin dev #21
Conversation
@@ -176,12 +195,12 @@ get_gambl_metadata = function(seq_type_filter = "genome", | |||
TRUE ~ 50 | |||
)) | |||
if(with_outcomes){ | |||
outcome_table = get_gambl_outcomes() %>% dplyr::select(-sex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the outcome_table
is dropped here, but below (line 199) used in left_join. Is there other place the outcomes are obtained from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this line to the top of the function. It's always called but not always joined. I changed the default to always join it because I see no reason why not to. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found it now, see it - line 28. Missed it at first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to always join with outcomes by default, so all are returned at once 👍
R/database.R
Outdated
db=config::get("database_name") | ||
con <- DBI::dbConnect(RMariaDB::MariaDB(), dbname = db) | ||
all_outcome = dplyr::tbl(con,"outcome_metadata") %>% as.data.frame() | ||
get_gambl_outcomes = function(patient_ids,time_unit="year",censor_cbioportal=FALSE,complete_missing=FALSE,from_flatfile=FALSE){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for from_flatfile here is set to FALSE, but in the get_metadata default is TRUE. Do you want to default it to TRUE here as well, so the default values are consistent between functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't really matter because the only call to get_gambl_outcomes in the code is part of get_gambl_metadata and it passes the desired value of this variable along with it. In the long run I'd actually like to move away from using the database for these tables so I'll make the default false.
con <- DBI::dbConnect(RMariaDB::MariaDB(), dbname = db) | ||
get_coding_ssm = function(limit_cohort,exclude_cohort, | ||
limit_pathology,limit_samples,basic_columns=TRUE, | ||
from_flatfile=FALSE,groups=c("gambl","icgc_dart")){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I think might be helpful to harmonize the defaults with other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly disagree. Easier to explain in a conversation. Keeping the metadata in the database has proven to be very problematic due to the ongoing change in the order and number of columns. I'd like to always use the database for all functions except for metadata so the defaults shouldn't be harmonized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes, the metadata is a subject to constant updates. I got it now that defaults for flat files should be different here
biopsy_meta = dplyr::tbl(con,"biopsy_metadata") %>% dplyr::select(-patient_id) %>% dplyr::select(-pathology) %>% dplyr::select(-time_point) %>% dplyr::select(-EBV_status_inf) #drop duplicated columns | ||
|
||
biopsy_meta = biopsy_meta %>% dplyr::select(-patient_id) %>% dplyr::select(-pathology) %>% dplyr::select(-time_point) %>% dplyr::select(-EBV_status_inf) #drop duplicated columns | ||
|
||
all_meta = dplyr::left_join(sample_meta,biopsy_meta,by="biopsy_id") %>% as.data.frame() | ||
all_meta = all_meta %>% mutate(bcl2_ba=ifelse(bcl2_ba=="POS_BCC","POS",bcl2_ba)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth cleaning up all the FISH columns here?
all_meta = all_meta %>% mutate_at(vars(starts_with(c("myc", "bcl2", "bcl6")) & ends_with(c("_ba", "_cn"))),
~str_remove(., "_.*") %>% str_replace(., "^NORM$|^NOMR$", "NORMAL"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point but I think it should actually be done in GAMBL rather than GAMBLR so the metadata is clean before this package sees it. I suggest discussing how to tackle this with @lkhilton
Pull Request Checklists
Important: When opening a pull request, keep only the applicable checklist and delete all other sections.
Checklist for all PRs
Required
I tested the new code for my use case (please provide a reproducible example of how you tested the new functionality)
I ensured all dplyr functions that commonly conflict with other packages are fully qualified.
This can be checked and addressed by running
check_functions.pl
and responding to the prompts. Test your code after you do this.devtools::document()
) and addedNAMESPACE
and all other modified files in the root directory and underman
.Optional but preferred with PRs
Checklist for New Functions
Required
I documented my function using ROxygen style.)
All parameters for the function are described in the documentation and the function has a decriptive title.
Example:
import
statment.Example:
Checklist for changes to existing code
I added/removed arguments to a function and updated documentation for all changed/new arguments
I tested the new code for compatability with existing functionality in the Master branch (please provide a reprex of how you tested the original functionality)
This was tested by Brett for the new functionality. the from_flatfile options work for metadata and coding ssms.