Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow name_only option gensim downloader api #2143

Merged
merged 35 commits into from
Aug 3, 2018

Conversation

aneesh-joshi
Copy link
Contributor

@aneesh-joshi aneesh-joshi commented Jul 31, 2018

Currently, to get the exact names of the models or corpora, a user has to either:

  1. run gensim.info() and look through the huge json dump to get the exact names
  2. go to the gensim-data website and check

When using gensim-data, I often forget the exact key.
"Was it 'glove-wiki-gigaword' or 'glove-gigaword-wiki'?"

It would be very helpful if a user could, in the terminal or otherwise, type:
gensim.info(name_only=True) or
python -m gensim.downloader --info_name_only
ans get the following output:

{
    "corpora": [
        "semeval-2016-2017-task3-subtaskBC",
        "semeval-2016-2017-task3-subtaskA-unannotated",
        "patent-2017",
        "quora-duplicate-questions",
        "wiki-english-20171001",
        "text8",
        "fake-news",
        "20-newsgroups",
        "__testing_matrix-synopsis",
        "__testing_multipart-matrix-synopsis"
    ],
    "models": [
        "fasttext-wiki-news-subwords-300",
        "conceptnet-numberbatch-17-06-300",
        "word2vec-ruscorpora-300",
        "word2vec-google-news-300",
        "glove-wiki-gigaword-50",
        "glove-wiki-gigaword-100",
        "glove-wiki-gigaword-200",
        "glove-wiki-gigaword-300",
        "glove-twitter-25",
        "glove-twitter-50",
        "glove-twitter-100",
        "glove-twitter-200",
        "__testing_word2vec-matrix-synopsis"
    ]
}

Notes:
The current develop's downloader.py is failing the doctests without me doing anything.

@@ -29,6 +29,7 @@
Also, this API available via CLI::

python -m gensim.downloader --info <dataname> # same as api.info(dataname)
python -m gensim.downloader --info_name_only # same as api.info(name_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to do it as parameter of --info flag I think (instead of new --info_* flag), like --info name

@aneesh-joshi
Copy link
Contributor Author

changes made @menshikh-iv

@@ -29,6 +29,7 @@
Also, this API available via CLI::

python -m gensim.downloader --info <dataname> # same as api.info(dataname)
python -m gensim.downloader --info name_only # same as api.info(name_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--info name please :) (but stay name_only parameter for CLI)

@menshikh-iv
Copy link
Contributor

@aneesh-joshi thanks!

@menshikh-iv menshikh-iv merged commit 4520adf into piskvorky:develop Aug 3, 2018
@aneesh-joshi aneesh-joshi deleted the name_only_develop branch August 3, 2018 06:30
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants