Skip to content

Commit

Permalink
Merge pull request #3284 from RaRe-Technologies/fixdocs
Browse files Browse the repository at this point in the history
[MRG] Documentation fixes + added CITATION.cff
  • Loading branch information
piskvorky authored Jan 25, 2022
2 parents 8b8203d + 903ae65 commit 7e898f4
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 14 deletions.
31 changes: 31 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Řehůřek"
given-names: "Radim"
title: "Gensim: Topic modelling for humans"
version: 4.1.0
url: "https://github.com/RaRe-Technologies/gensim"
preferred-citation:
type: conference-paper
authors:
- family-names: "Řehůřek"
given-names: "Radim"
- family-names: "Sojka"
given-names: "Petr"
publisher:
name: "University of Malta"
date-published: "2010-05-22"
year: 2010
month: 5
start: 45 # First page number
end: 50 # Last page number
pages: 5
title: "Software Framework for Topic Modelling with Large Corpora"
languages: ["eng"]
url: "http://is.muni.cz/publication/884893/en"
conference:
name: "Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks"
city: Valetta
country: MT
location: "University of Malta, Valletta, Malta"
28 changes: 14 additions & 14 deletions gensim/interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,63 +29,63 @@ class CorpusABC(utils.SaveLoad):
.. sourcecode:: pycon
>>> from gensim.corpora import MmCorpus # this is inheritor of CorpusABC class
>>> from gensim.corpora import MmCorpus # inherits from the CorpusABC class
>>> from gensim.test.utils import datapath
>>>
>>> corpus = MmCorpus(datapath("testcorpus.mm"))
>>> for doc in corpus:
... pass # do something with the doc...
A document represented in bag-of-word (BoW) format, i.e. list of (attr_id, attr_value),
A document represented in the bag-of-word (BoW) format, i.e. list of (attr_id, attr_value),
like ``[(1, 0.2), (4, 0.6), ...]``.
.. sourcecode:: pycon
>>> from gensim.corpora import MmCorpus # this is inheritor of CorpusABC class
>>> from gensim.corpora import MmCorpus # inherits from the CorpusABC class
>>> from gensim.test.utils import datapath
>>>
>>> corpus = MmCorpus(datapath("testcorpus.mm"))
>>> doc = next(iter(corpus))
>>> print(doc)
[(0, 1.0), (1, 1.0), (2, 1.0)]
Remember, that save/load methods save only corpus class (not corpus as data itself),
for save/load functionality, please use this pattern :
Remember that the save/load methods only pickle the corpus object, not
the (streamed) corpus data itself!
To save the corpus data, please use this pattern :
.. sourcecode:: pycon
>>> from gensim.corpora import MmCorpus # this is inheritor of CorpusABC class
>>> from gensim.corpora import MmCorpus # MmCorpus inherits from CorpusABC
>>> from gensim.test.utils import datapath, get_tmpfile
>>>
>>> corpus = MmCorpus(datapath("testcorpus.mm"))
>>> tmp_path = get_tmpfile("temp_corpus.mm")
>>>
>>> MmCorpus.serialize(tmp_path, corpus) # serialize corpus to disk in MmCorpus format
>>> # MmCorpus.save_corpus(tmp_path, corpus) # this variant also possible, but if serialize availbe - call it.
>>> MmCorpus.serialize(tmp_path, corpus) # serialize corpus to disk in the MmCorpus format
>>> loaded_corpus = MmCorpus(tmp_path) # load corpus through constructor
>>> for (doc_1, doc_2) in zip(corpus, loaded_corpus):
... assert doc_1 == doc_2 # check that corpuses exactly same
... assert doc_1 == doc_2 # no change between the original and loaded corpus
See Also
--------
:mod:`gensim.corpora`
Corpuses in different formats
Corpora in different formats.
"""
def __iter__(self):
"""Iterate all over corpus."""
raise NotImplementedError('cannot instantiate abstract base class')

def save(self, *args, **kwargs):
"""Saves corpus in-memory state.
"""Saves the in-memory state of the corpus (pickles the object).
Warnings
--------
This save only the "state" of a corpus class, not the corpus data!
This saves only the "internal state" of the corpus object, not the corpus data!
For saving data use the `serialize` method of the output format you'd like to use
(e.g. :meth:`gensim.corpora.mmcorpus.MmCorpus.serialize`).
To save the corpus data, use the `serialize` method of your desired output format
instead, e.g. :meth:`gensim.corpora.mmcorpus.MmCorpus.serialize`.
"""
import warnings
Expand Down

0 comments on commit 7e898f4

Please # to comment.