Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Assays: matrix and/or Matrix #157

Closed
lgatto opened this issue Dec 10, 2021 · 6 comments
Closed

Assays: matrix and/or Matrix #157

lgatto opened this issue Dec 10, 2021 · 6 comments
Assignees

Comments

@lgatto
Copy link
Member

lgatto commented Dec 10, 2021

Now that we have (or will have) a fully functional aggregation by matrix using sparse adjacency matrices, we need to consider whether the assay of a QFeature's SummarizedExperiment should be limited to a matrix (as has been so far, at least implicitly) or if we also want to support assays of type Matrix:

> m <- matrix(1:9, ncol = 3, dimnames = list(paste0("pep", 1:3), paste0("prot", 1:3)))
> m
     prot1 prot2 prot3
pep1     1     4     7
pep2     2     5     8
pep3     3     6     9
> adj <- sparseMatrix(i = 1:3, j = 1:3, x = 1)
> adj
3 x 3 sparse Matrix of class "dgCMatrix"
          
[1,] 1 . .
[2,] . 1 .
[3,] . . 1
> MsCoreUtils::aggregate_by_matrix(m, adj, FUN = MsCoreUtils::colSumsMat)
3 x 3 Matrix of class "dgeMatrix"
     prot1 prot2 prot3
[1,]     1     4     7
[2,]     2     5     8
[3,]     3     6     9

Currently, this doesn't happen because the aggregated assay is coerced into a matrix. While there is little reason to use sparse matrices in the assays at this point, and I think coercion to matrices is fine, I would like to keep a record of this and gather your opinions @cvanderaa @sgibb @jorainer.

@cvanderaa
Copy link
Collaborator

I would like to include HDF-5 backends in the discussion, since I believe it is somewhat related and could become useful with large SCP datasets!

@lgatto
Copy link
Member Author

lgatto commented Dec 10, 2021

Indeed, thanks.

I stumbled on this because some functions check whether the assay is indeed a matrix, which fails if it's a Matrix. I suppose that for a HDF5 on-disk backend, it's a DelayedArray?

@cvanderaa
Copy link
Collaborator

Yes, it would need support for DelayedArray class objects.

@cvanderaa cvanderaa self-assigned this Aug 25, 2022
@cvanderaa
Copy link
Collaborator

cvanderaa commented Aug 31, 2022

Regarding implementing Matrix, I face a few hurdles:

  1. How should we run unit test? Ideally, I would like to setup a global parameters for all unit tests and run twice all tests with either sparse or dense matrices, but this is difficult and apparently not recommended: https://github.com/hadley/r-pkgs/blob/6178636460dd25bc5cab8c33a6b0461a72da2b7b/testing-design.Rmd#L380
  2. Many functions fail with Matrix objects because MsCoreUtils expect matrix objects (eg: aggregate_by_vector, impute_matrix, normalize_matrix). Changes are required there first.
  3. Using the Matrix class, only zeros values are considered sparse (as far as I understand). However, as soon as we use zeroIsNA(), the sparsity is lost since NA's are not considered sparse values.

@cvanderaa
Copy link
Collaborator

cvanderaa commented Sep 6, 2022

After an in-person discussion we decide that Matrix class would be of limited interest given point 3. HDF-5 however still is of interest, but deserves a separate issue, cf #171.

@jorainer
Copy link
Member

jorainer commented Sep 8, 2022

thanks for updating the issue with an explanation on why the issue was closed!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants