Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Doc2Vec docstring updated to reflect actual default value of sample parameter. Fix #1302. #1307

Merged
merged 1 commit into from
May 7, 2017
Merged

Conversation

datapythonista
Copy link
Contributor

No description provided.

@tmylk tmylk changed the title Doc2Vec docstring updated to reflect actual default value of sample parameter (1302) Doc2Vec docstring updated to reflect actual default value of sample parameter. Fix #1302. May 7, 2017
@tmylk tmylk merged commit 7b012a7 into piskvorky:develop May 7, 2017
@@ -580,7 +580,7 @@ def __init__(self, documents=None, dm_mean=None,
need about 1GB of RAM. Set to `None` for no limit (default).

`sample` = threshold for configuring which higher-frequency words are randomly downsampled;
default is 0 (off), useful value is 1e-5.
default is 1e-3, useful value is 1e-5.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

This doc reads very strange now: default value is X, useful value is Y. Why not use the useful value? 🤔
@gojomo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 'useful value' remark dates back to something in word2vec.c code, before it had a (modest) non-zero default. I suppose we could stop at just mentioning the default, or hint that more aggressive values are also worth trying with a comment like: "default is 1e-3; values to 1e-5 (or lower) may also be useful". (On large datasets, I've seen ok results with 1e-06 and 1e-07, which each offer additional training speedups.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. @menshikh-iv @datapythonista can you do this tiny fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky Yes, of course.

@menshikh-iv menshikh-iv mentioned this pull request Jun 3, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants