Text Processing

Some text processing tools:

Word2Vec (word2vec.py): Create word embeddings with Neural Networks via skip-gram and CBOW models. More information at https://radimrehurek.com/gensim/models/word2vec.html
Latent Dirichlet Allocation (LDA) (lda.py): Statistical model that classifies each word or text as a mixture of N topics, by assigning to each word/text the probability that it has been generated from each of the N topics. More information at https://radimrehurek.com/gensim/models/ldamodel.html
Sentiment Analysis (sentiment/sentiment.py): Calculate the sentiment of a text via SentiStrength (http://sentistrength.wlv.ac.uk/)
Preprocessing (text_processing.py): Preprocess the text by the following stages:
1. Transform all characters to Lowercase
2. Tokenize the sentence to words
3. Remove Stop Words
4. Stem words

To use these text processing tools a gensim dictionary needs to be created and saved (https://radimrehurek.com/gensim/corpora/dictionary.html). The dictionary must be created with preprocessed words.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
sentiment		sentiment
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
lda.py		lda.py
requirements.txt		requirements.txt
text_processing.py		text_processing.py
word2vec.py		word2vec.py

Provide feedback