Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

SCTransform error #8185

Closed
jessicaliu09 opened this issue Dec 13, 2023 · 3 comments
Closed

SCTransform error #8185

jessicaliu09 opened this issue Dec 13, 2023 · 3 comments
Assignees

Comments

@jessicaliu09
Copy link

Hi,
I ran the following code
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$batch)
obj <- SCTransform(obj, variable.features.n = 3000, vars.to.regress = c('percent.mt','nCount_RNA'))

It worked for some batches, but then stopped at this batch with error:
Getting residuals for block 1(of 2) for Batch12 dataset
Getting residuals for block 2(of 2) for Batch12 dataset
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'as.matrix': incorrect number of dimensions
Calls: SCTransform ... get_residuals -> as.matrix -> .handleSimpleError -> h
Execution halted

This batch seems to have enough cells as other batches
table(obj$batch)
Batch Number of Cells
B4 3905
B5 4734
B6 1864
B7 4727
B8 3900
B9 3478
B10 5131
B11 4342
B12 5001
B13 4656
B14 4871
B15 4863
B16 5213
B17 2254
B18 5052
B19 6357
B20 4391
B21 5425
B22 5849

packageVersion('SeuratObject')
[1] ‘5.0.1’
packageVersion('Seurat')
[1] ‘5.0.1’

packageVersion('SeuratWrappers')
[1] ‘0.3.2’
packageVersion("sctransform")
[1] ‘0.4.1’

Do you know the causes?

Thanks.

@saketkc
Copy link
Collaborator

saketkc commented Dec 15, 2023

Hi @jessicaliu09 I cannot replicate this with our test datasets, but if you can email me your object on schoudhary@nygenome.org, I can take a look.

@saketkc saketkc self-assigned this Dec 15, 2023
@jessicaliu70
Copy link

Hi,

I have sent the email.

I have another question for SCTransform.
What's the minimum number of cells for running SCTransform?
Here, each batch is an experiment.
When I ran SCTransform for each batch, it gives me many small clusters, while log normalization doesn't. And I don't see this for cell types with more cells in each batch. It seems it is because the number of cells in each batch is too small?

SCT normalized for celltype1:
OPCs_dimplot_sct umap unintegrated_3_2

SCT + integration by harmony for celltype1:
OPCs_dimplot_sct umap integrated dr_3_2

SCT + integration by harmony for celltype2:
Microglia_dimplot_sct umap integrated dr_3_2
Here, you can see there are many small clusters even after integration.

log normalized for celltype1:
OPCs_dimplot_umap unintegrated_3_2

My code for sctransform:
for each main cell type I run the following:
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$batch)
obj <- SCTransform(obj, variable.features.n = 3000, vars.to.regress = c('percent.mt'))
obj <- RunPCA(obj, npcs = 50, verbose = F)
obj <- FindNeighbors(obj, dims = 1:50, reduction = "pca")
obj <- FindClusters(obj, resolution = 0.5, cluster.name = "sct_unintegrated_clusters", method = "igraph",algorithm=4)
obj <- RunUMAP(obj, dims = 1:50, reduction = "pca", reduction.name = "sct.umap.unintegrated")

obj <- IntegrateLayers(
object = obj,
method = HarmonyIntegration,
normalization.method = "SCT",
verbose = F,
k.weight=30,
new.reduction = "integrated.dr",
)
obj <- FindNeighbors(obj, dims = 1:50, reduction = "integrated.dr")
obj <- FindClusters(obj, resolution = 0.5, cluster.name = "dr_clusters", method = "igraph",algorithm=4)
obj <- RunUMAP(obj, dims = 1:50, reduction = "integrated.dr", reduction.name = "sct.umap.integrated.dr")

@saketkc
Copy link
Collaborator

saketkc commented Dec 19, 2023

Thanks for sharing the object. I have pushed a fix here 64a6495
You can install the develop branch to test:

remotes::install_github("sataijalab/seurat", ref="develop")

For your question about small clusters, you could try restricting the residual range to something smallerSCTransform(clip.range = c(-5,5)). SCT uses 2000 cells for parameter estimation which is a reasonable assumption for most single-cell datasets but in case your batch size is small, restricting the clip range might be useful to prevent cells from generating high residuals. Hope this helps!

@saketkc saketkc closed this as completed Dec 19, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants