Skip to content

tonbadal/text_processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Processing

Some text processing tools:

  • Word2Vec (word2vec.py): Create word embeddings with Neural Networks via skip-gram and CBOW models. More information at https://radimrehurek.com/gensim/models/word2vec.html
  • Latent Dirichlet Allocation (LDA) (lda.py): Statistical model that classifies each word or text as a mixture of N topics, by assigning to each word/text the probability that it has been generated from each of the N topics. More information at https://radimrehurek.com/gensim/models/ldamodel.html
  • Sentiment Analysis (sentiment/sentiment.py): Calculate the sentiment of a text via SentiStrength (http://sentistrength.wlv.ac.uk/)
  • Preprocessing (text_processing.py): Preprocess the text by the following stages:
    1. Transform all characters to Lowercase
    2. Tokenize the sentence to words
    3. Remove Stop Words
    4. Stem words

Dictionary

To use these text processing tools a gensim dictionary needs to be created and saved (https://radimrehurek.com/gensim/corpora/dictionary.html). The dictionary must be created with preprocessed words.

About

Some text processing tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages