Distributed lda options #782

menshikh-iv · 2016-07-10T05:53:27Z

Update distributed LDA support. Now we can run worker/dispatcher in different network segments (not reachable by network broadcast). Broadcast variant also saved.

If you want to use broadcast, reading tutorial https://radimrehurek.com/gensim/dist_lsi.html on official site.

If you want to use new feature, add some arguments when you run a code, for example

Execute on all machines
export PYRO_SERIALIZERS_ACCEPTED=pickle export PYRO_SERIALIZER=pickle'
On NS server
python -m Pyro4.naming --host 0.0.0.0 --port <NS_PORT> -x
On workers
python -m gensim.models.lda_worker --host <NS_HOSTNAME> --port <NS_PORT> --no-broadcast -v
On dispatcher
python -m gensim.models.lda_dispatcher --host <NS_HOSTNAME> --port <NS_PORT> --no-broadcast -v
Create LdaModel
lda = LdaModel(..., ns_conf={"host": NS_HOST, "port": NS_PORT, "broadcast": False})
Train it!

piskvorky · 2016-07-12T09:45:28Z

gensim/models/lda_dispatcher.py

@@ -15,14 +15,21 @@


 from __future__ import with_statement
-import os, sys, logging, threading, time
+import argparse


This is py2.7 only. @tmylk I don't think we can drop support for py2.6 yet... is this import safe?

If it's triggered only on importing lda_dispatcher.py, it's probably fine... but we don't want py2.7+ imports in "core" gensim (at import gensim).

I checked, this triggered only on importing lda_dispatcher.py or lda_worker.py.
Backport for argparse in setup.py for python < 2.7 (proof)

piskvorky · 2016-07-12T09:47:54Z

Awesome! This is a great update, and nicely done too.

If you don't mind me asking, how do you use this distributed LDA @menshikh-iv? What is your usecase/goal?

menshikh-iv · 2016-07-12T10:50:22Z

@piskvorky, I have two usecases:

Content classification
Similarity search

I need to train LDA on large corpus of 'webpages content' and vectorize all webpages. Train process of LDA are very long. I could use several dedicated servers for training, but they not in local network, therefore I modified distributed LDA for my case.

piskvorky · 2016-07-12T11:20:09Z

Thanks, interesting! Is this a personal project, academic research or a commercial project? (We keep a list of gensim adopters.)

menshikh-iv · 2016-07-12T12:59:30Z

@piskvorky personal research for now

tmylk · 2016-07-13T16:23:31Z

@menshikh-iv Thanks for the PR! Could you add a short notebook-style tutorial for this feature and a note in the changelog?

menshikh-iv · 2016-07-13T18:12:46Z

@tmylk, unfortunately notebook-style tutorial for this feature is useless, because in notebook I can't demonstrate this feature. Maybe I update this page in documentation with small examples (like this message) ?

About changelog, I should add record to 0.3.12 in CHANGELOG.md ?

And I shoud create new PR for this actions?

tmylk · 2016-07-14T15:16:37Z

Hi @menshikh-iv, the 0.3.12 is the right version to use. A new small PR would be good.

Updating this page with instructions would be great:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/src/distributed.rst

manojpandey · 2016-09-19T15:19:55Z

Documentation changed from rst to markdown here: #859

menshikh-iv added 5 commits July 9, 2016 14:31

Add parameters to NS location & rm autorun NS when not located

fc30e8b

Add arguments for locate ns from lda_worker

5588332

Add arguments for locate ns from lda_dispatcher

51804f7

Lookup dispatcher from ldamodel

32aa335

Expose methods for network call

719bee5

piskvorky reviewed Jul 12, 2016
View reviewed changes

piskvorky assigned tmylk Jul 12, 2016

tmylk merged commit 6a289fe into piskvorky:develop Jul 13, 2016

menshikh-iv deleted the distributed-lda-options branch February 19, 2018 04:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed lda options #782

Distributed lda options #782

menshikh-iv commented Jul 10, 2016

piskvorky Jul 12, 2016 •

edited

Loading

menshikh-iv Jul 12, 2016

piskvorky commented Jul 12, 2016

menshikh-iv commented Jul 12, 2016 •

edited

Loading

piskvorky commented Jul 12, 2016 •

edited

Loading

menshikh-iv commented Jul 12, 2016

tmylk commented Jul 13, 2016

menshikh-iv commented Jul 13, 2016 •

edited

Loading

tmylk commented Jul 14, 2016

manojpandey commented Sep 19, 2016

Distributed lda options #782

Distributed lda options #782

Conversation

menshikh-iv commented Jul 10, 2016

piskvorky Jul 12, 2016 • edited Loading

Choose a reason for hiding this comment

menshikh-iv Jul 12, 2016

Choose a reason for hiding this comment

piskvorky commented Jul 12, 2016

menshikh-iv commented Jul 12, 2016 • edited Loading

piskvorky commented Jul 12, 2016 • edited Loading

menshikh-iv commented Jul 12, 2016

tmylk commented Jul 13, 2016

menshikh-iv commented Jul 13, 2016 • edited Loading

tmylk commented Jul 14, 2016

manojpandey commented Sep 19, 2016

piskvorky Jul 12, 2016 •

edited

Loading

menshikh-iv commented Jul 12, 2016 •

edited

Loading

piskvorky commented Jul 12, 2016 •

edited

Loading

menshikh-iv commented Jul 13, 2016 •

edited

Loading