-
Notifications
You must be signed in to change notification settings - Fork 76
ts-PCA performance is slow compared scikit-allel #1743
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
This may be related to #647, since as written the code is maintaining a (num nodes) x (num samples) matrix as it iterates over the trees. |
I just tried this out but realised we need to get #1246 merged |
Ah apologies, I should have said - I was running this off the branch in #1246! |
No worries @brieuclehmann! I just did some profiling, and the majority of the time is spent in What are out options here for doing less work per node @petrelharp? Would it worth trying to cast this as a function of IBD segments, so we can see if that approach is at least potentially faster? |
Closed in #3008 . |
Building on #898 and using the 'matrix multiplication' in WIP #1246 (i.e. genetic_relatedness_weighted), we're trying to implement PCA for tskit. This appears to be working 🎉 but is rather slow compared to scikit-allel. See the following code for a small reprex, where scikit-allel is approximately 20 times faster than our current tskit implementation.
The text was updated successfully, but these errors were encountered: