Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

WikiCorpus scans corpus to determine vocabulary when an empty dictionary is provided #2052

Closed
ouromoros opened this issue May 16, 2018 · 3 comments

Comments

@ouromoros
Copy link

I have the following code:

wiki = WikiCorpus(inp, dictionary={}, lemmatize=False)

When I use it, it tries to scan through the corpus to determine vocabulary, while a dictionary is provided.

It clearly violates what is being said in the documentation:

dictionary (Dictionary, optional) – Dictionary, if not provided, this scans the corpus once, to determine its vocabulary (this needs really long time).

@piskvorky
Copy link
Owner

piskvorky commented May 16, 2018

@ouromoros please report the versions, as per our issue template.

@steremma this seems a bug introduced here: https://github.com/RaRe-Technologies/gensim/pull/1821/files#diff-eece52d95c280dabe57c803c95d6bb96L335 . That commit changed the logic that worked and that is still documented.

@steremma
Copy link
Contributor

@piskvorky I have already submitted a fix at #2042, I would also also prefer to complement it with a test that catches the mistake but I am very low on time ATM

@ouromoros
Copy link
Author

@piskvorky Sorry for not including the version, but you're right, that commit is the cause of it. @steremma 's code shoud fix it, so it seems this issue can be closed soon.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants