Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

n_similarity() in word2vec and doc2vec raises ValueError if an empty list is passed #761

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Changes
- In hdpmodel and dtmmodel
- NOT BACKWARDS COMPATIBLE!
* Added random_state parameter to LdaState initializer and check_random_state() (@droudy, #113)
* `n_similarity()` raises `ValueError` if an empty list is passed to it in word2vec, doc2vec (@droudy, #761)
Copy link
Owner

@piskvorky piskvorky Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmylk now that CHANGELOG is markdown (rather than plaintext), let's add links as true links: people's profiles, links to pull requests, links to supporting research articles...


0.13.1, 2016-06-22

Expand Down
2 changes: 2 additions & 0 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,8 @@ def n_similarity(self, ds1, ds2):
index or string tag. (TODO: Accept vectors of out-of-training-set docs, as if from inference.)

"""
if not ds1 or not ds2:
raise ValueError("Can't compute similarity with an empty list")
v1 = [self[doc] for doc in ds1]
v2 = [self[doc] for doc in ds2]
return dot(matutils.unitvec(array(v1).mean(axis=0)), matutils.unitvec(array(v2).mean(axis=0)))
Expand Down
2 changes: 2 additions & 0 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1511,6 +1511,8 @@ def n_similarity(self, ws1, ws2):
True

"""
if not ws1 or not ws2:
raise ValueError("Can't compute similarity with an empty list")
v1 = [self[word] for word in ws1]
v2 = [self[word] for word in ws2]
return dot(matutils.unitvec(array(v1).mean(axis=0)), matutils.unitvec(array(v2).mean(axis=0)))
Expand Down
7 changes: 7 additions & 0 deletions gensim/test/test_doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,13 @@ def test_empty_errors(self):
# input not empty, but rather completely filtered out
self.assertRaises(RuntimeError, doc2vec.Doc2Vec, list_corpus, min_count=10000)

def test_n_similarity(self):
corpus = DocsLeeCorpus()
model = doc2vec.Doc2Vec(size=100, min_count=2, iter=20)
model.build_vocab(corpus)
model.train(corpus)
self.assertRaises(ValueError, model.n_similarity, ['graph', 'trees'], [])

def test_similarity_unseen_docs(self):
"""Test similarity of out of training sentences"""
rome_str = ['rome', 'italy']
Expand Down
1 change: 1 addition & 0 deletions gensim/test/test_word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,7 @@ def testSimilarities(self):

self.assertTrue(model.n_similarity(['graph', 'trees'], ['trees', 'graph']))
self.assertTrue(model.n_similarity(['graph'], ['trees']) == model.similarity('graph', 'trees'))
self.assertRaises(ValueError, model.n_similarity, ['graph', 'trees'], [])

def testSimilarBy(self):
"""Test word2vec similar_by_word and similar_by_vector."""
Expand Down