-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Distributed lda options #782
Conversation
@@ -15,14 +15,21 @@ | |||
|
|||
|
|||
from __future__ import with_statement | |||
import os, sys, logging, threading, time | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is py2.7 only. @tmylk I don't think we can drop support for py2.6 yet... is this import safe?
If it's triggered only on importing lda_dispatcher.py
, it's probably fine... but we don't want py2.7+ imports in "core" gensim (at import gensim
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, this triggered only on importing lda_dispatcher.py
or lda_worker.py
.
Backport for argparse in setup.py
for python < 2.7 (proof)
Awesome! This is a great update, and nicely done too. If you don't mind me asking, how do you use this distributed LDA @menshikh-iv? What is your usecase/goal? |
@piskvorky, I have two usecases:
I need to train LDA on large corpus of 'webpages content' and vectorize all webpages. Train process of LDA are very long. I could use several dedicated servers for training, but they not in local network, therefore I modified distributed LDA for my case. |
Thanks, interesting! Is this a personal project, academic research or a commercial project? (We keep a list of gensim adopters.) |
@piskvorky personal research for now |
@menshikh-iv Thanks for the PR! Could you add a short notebook-style tutorial for this feature and a note in the changelog? |
@tmylk, unfortunately notebook-style tutorial for this feature is useless, because in notebook I can't demonstrate this feature. Maybe I update this page in documentation with small examples (like this message) ? About changelog, I should add record to 0.3.12 in CHANGELOG.md ? And I shoud create new PR for this actions? |
Hi @menshikh-iv, the 0.3.12 is the right version to use. A new small PR would be good. Updating this page with instructions would be great: |
Documentation changed from |
Update distributed LDA support. Now we can run worker/dispatcher in different network segments (not reachable by network broadcast). Broadcast variant also saved.
If you want to use broadcast, reading tutorial https://radimrehurek.com/gensim/dist_lsi.html on official site.
If you want to use new feature, add some arguments when you run a code, for example
export PYRO_SERIALIZERS_ACCEPTED=pickle export PYRO_SERIALIZER=pickle
'python -m Pyro4.naming --host 0.0.0.0 --port <NS_PORT> -x
python -m gensim.models.lda_worker --host <NS_HOSTNAME> --port <NS_PORT> --no-broadcast -v
python -m gensim.models.lda_dispatcher --host <NS_HOSTNAME> --port <NS_PORT> --no-broadcast -v
lda = LdaModel(..., ns_conf={"host": NS_HOST, "port": NS_PORT, "broadcast": False})