-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
word2vec doesn't scale linearly with multi-cpu configuration ? #3376
Comments
You have 24 cores, hyperthreading is not as efficient as real physical cores. So peak performance around 24 is probably expected. |
@piskvorky Thanky you for fast reply :) Maybe to avoid confusion, instead of cpu_count from multiprocessing is better to use |
Note that only the
That said, despite the de#tent of the
So, that the optimal If you think any examples/docs in the latest Gensim should be updated to give better guidance, please point out the areas where info could be better, & suggested improvements – ideally as a PR for easiest review/merge. (In using |
@gojomo Thanks for detailed reply
I've noticed one more thing, that could be even more important (it is on proprietary data, so to be sure I need to to replicate it on synthetic data). When number of token per line in corpus is low (in my case 20) then peak performance occurs even earlier around 14 cores, but when you add more cores it starts to slower training significanlty, so you get "parabolic" performance. |
Your observation makes intuitive sense to me: the code around reading/demarking one text might have more chances of cross-thread/cross-core contention than the bulk calculations done once one text is chosen & all-in-cache. So the idea that shorter texts wouldn't achieve the same per-word throughput rates isn't surprising. That effect is even larger, I think, in the non- |
@gojomo , @piskvorky
|
FAST_VERSION is essentially to be interpreted as Maybe its user-facing interface should have been a True/False bool Gensim's *2vec models use cython, yes. Historically there was also a pure-Python mode using numpy only, but that has been removed (too slow). So |
Problem description
I've tried to use script from:
https://github.com/RaRe-Technologies/gensim/releases/3.6.0
with varying number of cores (num_cores) and obtained following times: 8 -> 26 sec, 16 -> 17 sec, 24 -> 14.4 sec, 32 -> 15.9 sec, 48 -> 16 sec. So it doesn't scale linearly with number of cores and peak seems to be at 24 cores.
My machine reports 48 cores by cpu_count(), by lscpu: CPUs: 48, Threads per core: 2, Cores per socket:12, Socket: 2, Numa nodes: 2, Model name: Intel Xeon E5-2650 v4 2.2 Ghz. Note, the same behaviour occurs for Doc2Vec and FastText.
Is it possible that only one socket is used, or I miss something?
Steps/code/corpus to reproduce
Versions
Linux-4.15...generic_x86_64_with_debian_buster_sid
64
numpy: 1.21.4
scipy: 1.7.3
gensim : 4.2.0
FAST_VERSION: the same behavior with 0 and 1
The text was updated successfully, but these errors were encountered: