Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parallel option for find.clusters and dapc? #339

Open
ac-harris opened this issue Oct 5, 2022 · 1 comment
Open

Parallel option for find.clusters and dapc? #339

ac-harris opened this issue Oct 5, 2022 · 1 comment

Comments

@ac-harris
Copy link

Hi, there--

I have a SNPbin object with ~500 individuals genotyped at ~670,000 SNPs. I selected an optimum number of PCs to retain for DAPC using the xval function run in parallel, which took about 3 days on our server. However, as far as I can tell, there's no parallel option for the find.clusters or dapc functions. We've been running the find.clusters function on this dataset using the optimum number of PCs from xval for ~ 2 weeks with no end in sight... Is there a way to parallel-ize find.clusters and dapc? Are there plans to add this functionality to the functions themselves?

I understand that we could randomly subset markers and run DAPC, but in an ideal world, I'd like to be able to compare patterns and inferences across the full dataset and a subset dataset. The code we ran for find.clusters is below.

# find clusters
B <- xval_iter[[6]] # number PCs achieving lowest RMSE from cross-validation (51)
print("starting find.clusters")
set.seed(1500)
grp <- find.clusters(up, n.pca = B, max.n.clust = 32)
save.image("clust.Rda")
print("find.clusters complete.")

Thank you!
Audrey

@gvp681
Copy link

gvp681 commented Dec 15, 2022

Hi,

Was this issue resolved? I am having a similar problem with the program and would like to figure out how to optimize the processing time.

Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants