-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Provide compatibility with HDF5Matrix #99
Conversation
Thanks for the PR Christophe! Some of my thoughts (haven't had a close look at the code yet):
|
Thanks for these first thoughts:
I'll work on these points and will get back once I have something to show. |
I adapted the points discussed above. I only face an issue with the I would love to get a code review. |
tests/testthat/test_aggregate.R
Outdated
@@ -1,3 +1,5 @@ | |||
library(HDF5Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding this here I would suggest to require the namespace for the package only in the unit tests were you use this package. See for example: here
Also, I would then maybe manually install the HDF5Array
in the github action (using BiocManager::install("HDF5Array")
.
I'll try to have a look at the code over the weekend. In the meantime you could maybe try to fix the GH error on Windows with my suggestion in the comment above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Christophe for the PR!
I only have some small comments/requests mostly on the documentation and unit tests.
##' try(aggregate_by_matrix(x, adj, colSumsMat, na.rm = TRUE)) | ||
##' colSumsMat | ||
##' c("A", "B")))) | ||
##' aggregate_by_matrix(x, adj, colSumsMat, na.rm = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the documentation a bit confusing as it is stated that na.rm
is not used. Could you maybe expand/fix the documentation and clearly state when na.rm
can be used and where not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks!
The PR now enables HDF5 compatibility for aggregation, imputation and normalization. @lgatto would you mind a final review before merging? One important point that I currently leave aside is that the functions are compatible for aggregation, imputation and normalization, but are far from optimized. In fact, I convert the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for me. Maybe wait for feedback from @lgatto
Fine by me, thank you @cvanderaa ! |
Pushed to Bioc. |
Hello,
This PR provides support for matrices stored in HDF5. This is in response to an ongoing issue in
QFeatures
: rformassspectrometry/QFeatures#171There are several functionality in
MsCoreUtils
that do not allow to provide matrices in an other class thanmatrix
. The functionality I currently identified to be adapted (forQFeatures
) are aggregation, imputation and normalization.Discussion
please do not merge! In the current state of the PR, I adapted the code only for aggregation as an example. I would like to discuss several points:
MsCoreUtils
?HDF5Matrix
object as input, should we expect anHDF5Matrix
object as output? For the moment, I convert tomatrix
for convenience so thataggregate_by_{matrix|vector}()
systematically return amatrix
.bonus
I found a little bug in
aggregate_by_matrix()
when missing data is present. I solved the issue and also include anna.rm
argument as suggested in the example. I can make a separate PR if needed.