Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

FastText wrapper returns inconsistent dtypes #1637

Closed
mcobzarenco opened this issue Oct 19, 2017 · 1 comment
Closed

FastText wrapper returns inconsistent dtypes #1637

mcobzarenco opened this issue Oct 19, 2017 · 1 comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix

Comments

@mcobzarenco
Copy link
Contributor

mcobzarenco commented Oct 19, 2017

Description

gensim.models.wrappers.FastText returns inconsistent dtypes.

Steps/Code/Corpus to Reproduce

from gensim.models.wrappers import FastText
embeds = FastText.load_fasttext_format(...)

For an existing word:

embeds['the'].dtype == dtype('float32')

For an "imputed" word (missing from the vocabulary). The word embedding is computed as the sum of embedding for n-grams:

embeds['ttttt'].dtype == dtype('float64')

The problem in models/wrappers/fasttext.py::FastTextKeyedVectors.word_vec. In the case of a missing word, the zero vector is initialised to be a 64-bit float array to which a bunch of 32-bit embeddings are added to.

Versions

Linux-4.4.0-97-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609]
NumPy 1.13.3
SciPy 0.19.1
gensim 3.0.1
FAST_VERSION 1

@piskvorky
Copy link
Owner

Nice catch @mcobzarenco ! Thanks.

@menshikh-iv menshikh-iv added bug Issue described a bug difficulty easy Easy issue: required small fix labels Oct 19, 2017
horpto pushed a commit to horpto/gensim that referenced this issue Oct 28, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix
Projects
None yet
Development

No branches or pull requests

3 participants