-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
doc2vec/word2vec/fasttext models do not appear to improve if similarities checked mid-training epochs #2260
Comments
It turns out
has to be called before each call |
If that's the case, then that's definitely a bug! Are you saying you have to call |
Well, yes and no. |
|
This is an excellent idea imo. I implemented it in this way. |
is it closed? I want to get to work on this @menshikh-iv |
@naba7 see status on top |
Sorry for my recent absence. I pushed new changes to the branch of the PR, but it is still closed. I hope it is reopened in the next days, so we can finish working on it. |
@timbicker done, see #2273 |
Also an issue for FastText: #2260 |
I believe this issue is moot given changes that eliminated so much normed-vector caching in Gensim-4.0. |
Description
I am training a doc2vec model on a large corpus. I need to observe the model for more detailed statistics for my supervisor/boss.
The problem is similar to the problem below where I just slightly modified the Doc2Vec Tutorial on the Lee Dataset. The model does not improve its recommendations for the
most_similar method
.Steps/Code/Corpus to Reproduce
Expected Results
I expect to see many improvements in either recommendation or distance.
Actual Results
Consol Output with four workers:
It surprises me that only the first sample in the training_corpus receives some updates. I don't understand it.
So I debug the model and there are no improvements anymore:
I try it with 1 worker only:
What's happening here and how can I see during training how my doc2vec model improves? Because it is also not possible to see the training_error for doc2vec #999.
Further experimenting reveals that docvecs.vectors_docs are of course updated between each call of
batch_end
. Butmost_similiar
always returns the same suggestion.Versions
Darwin-17.5.0-x86_64-i386-64bit
Python 3.7.0 (default, Jun 29 2018, 20:13:13)
[Clang 9.1.0 (clang-902.0.39.2)]
NumPy 1.15.0
SciPy 1.1.0
gensim 3.5.0
FAST_VERSION 0
The text was updated successfully, but these errors were encountered: