Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Phrases documentation for threshold argument is misleading #2111

Closed
umangv opened this issue Jun 28, 2018 · 2 comments · Fixed by #2242
Closed

Phrases documentation for threshold argument is misleading #2111

umangv opened this issue Jun 28, 2018 · 2 comments · Fixed by #2242
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix documentation Current issue related to documentation good first issue Issue for new contributors (not required gensim understanding + very simple) Hacktoberfest Issues marked for hacktoberfest

Comments

@umangv
Copy link
Contributor

umangv commented Jun 28, 2018

https://github.com/RaRe-Technologies/gensim/blob/37e49971efa74310b300468a5b3cf531319c6536/gensim/models/phrases.py#L252-L255

It feels to me like this should have said the opposite: "Heavily depends on concrete scoring-function" rather than "Hardly depends on concrete socring-function".

For example, if you choose npmi instead of the default, the threshold has to between -1 and 1, which makes the default (10.0) make no sense. The current documentation suggests that 10.0 will be OK.

@gojomo
Copy link
Collaborator

gojomo commented Jul 4, 2018

I suspect the intended meaning was, as you surmise, "heavily depends"... and the docs around the 'npmi' option should make that acceptable range clear.

@piskvorky piskvorky added bug Issue described a bug documentation Current issue related to documentation labels Jul 4, 2018
@menshikh-iv menshikh-iv added difficulty easy Easy issue: required small fix good first issue Issue for new contributors (not required gensim understanding + very simple) Hacktoberfest Issues marked for hacktoberfest labels Sep 28, 2018
@jenishah
Copy link
Contributor

Hardly has already been replaced by Heavily. Would it be a good idea to add the range of the score under returns in the documentation of the scoring function?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix documentation Current issue related to documentation good first issue Issue for new contributors (not required gensim understanding + very simple) Hacktoberfest Issues marked for hacktoberfest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants