Some text processing tools:
- Word2Vec (word2vec.py): Create word embeddings with Neural Networks via skip-gram and CBOW models. More information at https://radimrehurek.com/gensim/models/word2vec.html
- Latent Dirichlet Allocation (LDA) (lda.py): Statistical model that classifies each word or text as a mixture of N topics, by assigning to each word/text the probability that it has been generated from each of the N topics. More information at https://radimrehurek.com/gensim/models/ldamodel.html
- Sentiment Analysis (sentiment/sentiment.py): Calculate the sentiment of a text via SentiStrength (http://sentistrength.wlv.ac.uk/)
- Preprocessing (text_processing.py): Preprocess the text by the following stages:
- Transform all characters to Lowercase
- Tokenize the sentence to words
- Remove Stop Words
- Stem words
To use these text processing tools a gensim dictionary needs to be created and saved (https://radimrehurek.com/gensim/corpora/dictionary.html). The dictionary must be created with preprocessed words.