Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

Open
mortunco opened this issue Jul 5, 2023 · 2 comments
Open

Performance on Xenium/Merfish(Vizgen) - Single cell integration #97

mortunco opened this issue Jul 5, 2023 · 2 comments

Comments

@mortunco
Copy link

mortunco commented Jul 5, 2023

Hello,

First of all thank you very much for coming up with a detailed tutorial. It made my life easy.

I am trying to implement Tangram to integrate recent xenium and SC data.

Xenium data has 160k cells and 313 genes, SC data has 8k cells , 20k genes (after filtration)

After filtering low quality cells and normalisation (or without normalisation), I trained the model with %90 of the Xenium genes and used %10 to test model performance.

### train
ad_map = tg.map_cells_to_space(sc_adata, xen_adata_copy,
        mode="cells",
        #density_prior='rna_count_based', ### This is for visium.
        density_prior='uniform', #### this is for merfish.
        num_epochs=1000,
        device="cuda:0"
    )
### predict
ad_ge = tg.project_genes(adata_map=ad_map, adata_sc=sc_adata)

pred=ad_ge[:,test_genes].to_df()
truth=xen_adata[:,test_genes].to_df()

###
Further comparison and diagnostic plots.

Here you are seeing the best performing and the worst performing genes based on their difference to the Truth dataset.

With no depth norm
image

With depth norm,
image

I am trying to understand why Tangram is having problems.

  1. Am I doing something wrong?
  2. I am little bit suspicious about the section of your paper where it says # cells in SC data > cell/voxel/spot numbers in the spatial data. But in this case we see 160k cells from xenium vs 6-7k cells from SC data. Can this be a problem?
  3. To calculate model prediction accuracy, we are calculating RMSE (test_pred_rmse=np.sqrt(((pred-truth) ** 2).mean())) where pred and truth are the expression matrix with same cell and gene order. Then we calculate the same for baseline as shown under. Finally we calculate a measure with (1- test_pred_rmse/test_baseline_rmse).
baseline=truth.mean(axis=0)
test_baseline_rmse=np.sqrt(((pred-baseline.values) ** 2).mean())

I am open to any guidance,

Best regards,

Tunc.

@HelloWorldLTY
Copy link

Hi, I believe using count based data for training is more acceptable, that is because 1. Since there are missing genes in the spatial data, we cannot directly normalize the raw count spatial data. 2. Tangram does not have specific distribution modeling for input data.

But I think Xenium has large-scale spots and it is hard for me to place tangram in my gpu node. Do you use the cpu version to train your model? Thanks.

@Hejin0701
Copy link
Collaborator

Hi @mortunco , one restriction of Tangram is that the cell type compositions between the scRNA-seq and spatial need to be similar. For Q2, how does the cell type composition compare between the sc and spatial? And another question is that how does the gene prediction behavior looks like overall? Can you help to plot the cos_sim vs sparsity of the gene as in the tutorial.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants