You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the Word2vec module of Gensim library to train a word embedding, the dataset is 400k sentences with 100k unique words (its not english)
I'm using this code to monitor and calculate the loss :
class MonitorCallback(CallbackAny2Vec):
def __init__(self, test_words):
self._test_words = test_words
def on_epoch_end(self, model):
print("Model loss:", model.get_latest_training_loss()) # print loss
for word in self._test_words: # show wv logic changes
print(model.wv.most_similar(word))
monitor = MonitorCallback(["MyWord"]) # monitor with demo words
w2v_model = gensim.models.word2vec.Word2Vec(size=W2V_SIZE, window=W2V_WINDOW, min_count=W2V_MIN_COUNT , callbacks=[monitor])
w2v_model.build_vocab(tokenized_corpus)
words = w2v_model.wv.vocab.keys()
vocab_size = len(words)
print("Vocab size", vocab_size)
print("[*] Training...")
w2v_model.train(tokenized_corpus, total_examples=len(tokenized_corpus), epochs=W2V_EPOCH)
The problem is from epoch 1 the loss is 0 and the vector of the monitored words dont change at all!
[*] Training...
Model loss: 0.0
Model loss: 0.0
Model loss: 0.0
Model loss: 0.0
so what is the problem here? is this normal? the tokenized corpus is a list of lists that are something like tokenized_corpus[0] = [ "word1" , "word2" , ...]
I googled and seems like some of the old versions of gensim had problem with calculating loss function, but they are from almost a year ago and it seems like it should be fixed right now?
I tried the code provided in the answer of this question as well but still the loss is 0 :
Unless/until you're sure your concern is a bug, questions are better handled via Stack Overflow (where I also answered your question) or the project discussion list, to reserve this issue-tracker for bugs & feature requests.
I am using the Word2vec module of Gensim library to train a word embedding, the dataset is 400k sentences with 100k unique words (its not english)
I'm using this code to monitor and calculate the loss :
The problem is from epoch 1 the loss is 0 and the vector of the monitored words dont change at all!
so what is the problem here? is this normal? the tokenized corpus is a list of lists that are something like tokenized_corpus[0] = [ "word1" , "word2" , ...]
I googled and seems like some of the old versions of gensim had problem with calculating loss function, but they are from almost a year ago and it seems like it should be fixed right now?
I tried the code provided in the answer of this question as well but still the loss is 0 :
https://stackoverflow.com/questions/52038651/loss-does-not-decrease-during-training-word2vec-gensim
The text was updated successfully, but these errors were encountered: