Long run time #4

DKaukonen · 2024-03-19T11:57:41Z

Hi,

This is a nice package and the documentation is helpful. There is one issue I am having. I am running the following command

consensus_cluster(data, k_max=15, n_reps=100, p_sample =0.8, p_feature=0.8).

My data comes from 8 samples totaling 36 260 cells and 3664 genes. It is scaled data. When I run that code, it says it will take an estimated 5 days to run. I do have 256GB of memory and 64 cores. Is there a way to run this command in parallel? I need to check up to 90 clusters, so taking 5 days to do 15 at a time will take a long time.

Also, is it normal to take 5 days to check the first 15 clusters?

Thank you,
-Damien

The text was updated successfully, but these errors were encountered:

AndiMunteanu · 2024-10-07T20:51:31Z

Hello, Damien!

Thank you for your question and sorry for reaching out this late!
Unfortunately, I have limited availability to further improve the performance of the PAC component and I do not think there will be improvements done on this section of the package in the near future.

However, as suggested here, you can try downsampling your dataset using methods such as geosketch, infer the appropriate number of cluster on the subsample and then use this information to cluster your entire dataset.

Hopefully this helps.

Andi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long run time #4

Long run time #4

DKaukonen commented Mar 19, 2024 •

edited

Loading

AndiMunteanu commented Oct 7, 2024

Long run time #4

Long run time #4

Comments

DKaukonen commented Mar 19, 2024 • edited Loading

AndiMunteanu commented Oct 7, 2024

DKaukonen commented Mar 19, 2024 •

edited

Loading