Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error creating SOMA from Seurat with empty matrices #92

Closed
adomingues opened this issue Feb 22, 2023 · 8 comments
Closed

Error creating SOMA from Seurat with empty matrices #92

adomingues opened this issue Feb 22, 2023 · 8 comments

Comments

@adomingues
Copy link

adomingues commented Feb 22, 2023

Hi all,

First of all thank you for this implementation of tiledb. I am currently trying it out but I can already see myself using it a lot.

While testing it, I ran into this issue converting one of my Seurat objects to a Soma (hosted on s3 but also locally):

library("tiledb")
library("tiledbsc")
soco_uri <- file.path("s3://xxx/cell_type_manual/", "rnaAggr")
soco <- SOMACollection$new(uri = soco_uri)
soco$from_seurat(rnaAggr)

(...)
No AssayMatrix found at 's3://xxx/cell_type_manual//PKDContAggr.scType/soma_integrated/X/counts'
Creating new AssayMatrix array with index [var_id,obs_id] at 's3://xxx/cell_type_manual//PKDContAggr.scType/soma_integrated/X/counts'
Ingesting AssayMatrix data into: s3://xxx/cell_type_manual//PKDContAggr.scType/soma_integrated/X/counts
Error in libtiledb_query_set_buffer_var_char(qryptr, colnam, buflist[[k]]) : 
  [TileDB::Query] Error: Cannot set buffer; var_id buffer is null

Admittedly this is a fairly large object:

rnaAggr
An object of class Seurat 
59970 features across 114132 samples within 3 assays 
Active assay: RNA (27970 features, 3000 variable features)
 2 other assays present: SCT, integrated
 3 dimensional reductions calculated: pca, harmony, umap

do you think that's the issue?

Cheers!

@adomingues adomingues changed the title In from_seurat missing gene name Error creating soma into s3 Feb 22, 2023
@adomingues
Copy link
Author

Update the report. The error also occurs when creating the soma to a local tmp location.

@adomingues adomingues changed the title Error creating soma into s3 Error creating large soma Feb 22, 2023
@adomingues
Copy link
Author

Update: the issue is not the dataset size. I downsampled the data and the error persists.

## local file
soco_uri <- file.path(tempdir(), "rnaAggr")
soco <- SOMACollection$new(uri = soco_uri)
sub <- subset(x = rnaAggr, downsample = 100)

soco$from_seurat(sub)

Creating new AnnotationPairwiseMatrixGroup at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/varp'
No TileDBGroup currently exists at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/uns'
Creating new TileDBGroup at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/uns'
Creating new AnnotationDataframe array with index [obs_id] at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/obs'
Ingesting AnnotationDataframe data into: /tmp/RtmpVLfG54/rnaAggr/soma_integrated/obs
Creating new AnnotationDataframe array with index [var_id] at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/var'
Ingesting AnnotationDataframe data into: /tmp/RtmpVLfG54/rnaAggr/soma_integrated/var
No AssayMatrix found at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/X/counts'
Creating new AssayMatrix array with index [var_id,obs_id] at '/tmp/RtmpVLfG54/rnaAggr/soma_integrated/X/counts'
Ingesting AssayMatrix data into: /tmp/RtmpVLfG54/rnaAggr/soma_integrated/X/counts
Error in libtiledb_query_set_buffer_var_char(qryptr, colnam, buflist[[k]]) : 
  [TileDB::Query] Error: Cannot set buffer; var_id buffer is null

@mojaveazure
Copy link
Member

Hi there, thanks for letting us know. Could you provide us with the versions of tiledbsc, tiledb-r, and TileDB (the C++ engine) that you're using?

tiledb::tiledb_version(compact = TRUE)
sessionInfo()

Also, if you tell us what the sparsity of the assay matrix is?

@adomingues
Copy link
Author

Hi,

the data seems to be very sparse:

df <- rnaAggr@assays$RNA@data
sum(df == 0)/prod(dim(df))
[1] 0.9328357

And session info is below. Let me knwo if you need any more information.

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8      
 [8] LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] tiledbsc_0.1.5 tiledb_0.18.0 

loaded via a namespace (and not attached):
  [1] Seurat_4.3.0           Rtsne_0.16             colorspace_2.1-0       deldir_1.0-6           ellipsis_0.3.2         ggridges_0.5.4         rprojroot_2.0.3       
  [8] fs_1.6.1               spatstat.data_3.0-0    leiden_0.4.3           listenv_0.9.0          urltools_1.7.3         ggrepel_0.9.2          bit64_4.0.5           
 [15] fansi_1.0.4            codetools_0.2-18       splines_4.2.2          polyclip_1.10-4        jsonlite_1.8.4         ica_1.0-3              cluster_2.1.4         
 [22] png_0.1-8              uwot_0.1.14            shiny_1.7.4            sctransform_0.3.5      spatstat.sparse_3.0-0  compiler_4.2.2         httr_1.4.4            
 [29] assertthat_0.2.1       SeuratObject_4.1.3     Matrix_1.5-3           fastmap_1.1.0          lazyeval_0.2.2         cli_3.6.0              later_1.3.0           
 [36] htmltools_0.5.4        tools_4.2.2            igraph_1.3.5           gtable_0.3.1           glue_1.6.2             RANN_2.6.1             reshape2_1.4.4        
 [43] dplyr_1.0.10           Rcpp_1.0.10            scattermore_0.8        vctrs_0.5.2            RcppSpdlog_0.0.12      nlme_3.1-161           nanotime_0.3.7        
 [50] spatstat.explore_3.0-6 progressr_0.13.0       lmtest_0.9-40          spatstat.random_3.1-3  stringr_1.5.0          globals_0.16.2         mime_0.12             
 [57] miniUI_0.1.1.1         lifecycle_1.0.3        irlba_2.3.5.1          goftest_1.2-3          future_1.31.0          MASS_7.3-58.2          zoo_1.8-11            
 [64] scales_1.2.1           promises_1.2.0.1       spatstat.utils_3.0-1   parallel_4.2.2         RColorBrewer_1.1-3     spdl_0.0.4             reticulate_1.27       
 [71] bspm_0.4.1             pbapply_1.7-0          gridExtra_2.3          ggplot2_3.4.0          triebeard_0.3.0        stringi_1.7.12         rlang_1.0.6           
 [78] pkgconfig_2.0.3        matrixStats_0.63.0     lattice_0.20-45        tensor_1.5             ROCR_1.0-11            purrr_1.0.1            patchwork_1.1.2       
 [85] htmlwidgets_1.6.1      cowplot_1.1.1          bit_4.0.5              tidyselect_1.2.0       here_1.0.1             parallelly_1.34.0      RcppAnnoy_0.0.20      
 [92] plyr_1.8.8             magrittr_2.0.3         R6_2.5.1               generics_0.1.3         DBI_1.1.3              pillar_1.8.1           RcppCCTZ_0.2.12       
 [99] fitdistrplus_1.1-8     survival_3.5-0         abind_1.4-5            sp_1.6-0               tibble_3.1.8           future.apply_1.10.0    KernSmooth_2.23-20    
[106] utf8_1.2.2             spatstat.geom_3.0-5    plotly_4.10.1          grid_4.2.2             data.table_1.14.6      digest_0.6.31          xtable_1.8-4          
[113] tidyr_1.3.0            httpuv_1.6.8           munsell_0.5.0          viridisLite_0.4.1     

@mojaveazure
Copy link
Member

mojaveazure commented Feb 22, 2023

It looks like it's failing for the counts matrix of your integrated assay, not the RNA data? Is the sparsity the same for that matrix? Also, is it a sparse dgCMatrix or a dense S3 matrix?

m <- GetAssayData(object = rnaAggr, assay = 'integrated', slot = 'counts')
sum(m == 0) / prod(dim(x = m))
class(x = m)

@adomingues
Copy link
Author

Spot on @mojaveazure! It seems like the integrated assay is empty (only empty values) and thus causing the error:

 r$> sum(m == 0) / prod(dim(x = m))
[1] 1

r$> class(x = m)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

Once I removed it (rnaAggr[['integrated']] <- NULL) the soma was created successfully in a local file. I wonder if it's in scope to add a warning or more informative error when this happen? It's also fair game to say that this an error on the user's side :)


I am struggling with creating the soma directly in s3, but I think this is error is unrelated to tiledb, so feel free to ignore the error below and close the ticket. Thanks for your help!

r$> soco_uri <- "s3://xxx/rnaAggr"
    soco <- SOMACollection$new(uri = soco_uri)
Error in libtiledb_object_type(ctx@ptr, uri) : 
  [TileDB::S3] Error: Error while listing with prefix 's3://xxx/rnaAggr/__schema/' and delimiter '/'
Exception:  PermanentRedirect
Error message:  Unable to parse ExceptionName: PermanentRedirect Message: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

@mojaveazure
Copy link
Member

So, I'm guessing that the data and scale.data matrices of your integrated assay were populated, but the counts matrix was empty. Seurat requires that the data matrix be present, but allows the counts and scale.data matrices to be empty. An empty matrix is defined as either a 0 x 0 matrix or a 1 x 1 matrix with a value of NA

# 0 x 0 matrices
new("dgCMatrix")
new("matrix")

# 1 x 1 NA matrices
matrix()
as(Matrix::Matrix(), "dgCMatrix")

We should be skipping empty assay matrices as we write a Seurat object to SOMA; we'll work on getting a bug fix for you

@mojaveazure mojaveazure changed the title Error creating large soma Error creating SOMA from Seurat with empty matrices Feb 23, 2023
mojaveazure added a commit to single-cell-data/TileDB-SOMA that referenced this issue Feb 23, 2023
Seurat allows empty matrices for counts and scale.data
Empty matrices are defined as 0x0 or 1x1 with a value of NA
Use SeuratObject::IsMatrixEmpty to determine empty matrices
Skip writing empty matrices

fixes TileDB-Inc/tiledbsc#92
@mojaveazure
Copy link
Member

This should be fixed on the main-old branch of single-cell-data/TileDB-SOMA, the up-and-coming home for SOMA. You can install it with

remotes::install_github('single-cell-data/tiledb-soma', ref = 'main-old', subdir = 'apis/r')
library(tiledbsoma) # new package name

Please re-open if the bug still persists after updating

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants