Skip to content

Implemented Gensim Latent Dirichlet Allocation (LDA) model to model topics and extract keywords from both ABC news article dataset and AAN conference paper dataset

License

Notifications You must be signed in to change notification settings

kevinxyc1/Topic-Modeling-using-LDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic-Modeling-using-LDA

Here are two datasets trained with LDA model:

  1. ABC News article: Extracts the topic and keyboard from each news article (csv file) for over 10,000 articles. Two approaches used with the model: bag of words, TF-IDF
  • temp.py and abc.csv file
  1. AAN Dataset: Extracts four keywords from each paper in the AAN (American Academy of Neurology) Conference for over 20,000 papers (json object). Implemented Gensim LDA and output the title and trained keywords as the final product.
  • jsontest.py and aan.json file

About

Implemented Gensim Latent Dirichlet Allocation (LDA) model to model topics and extract keywords from both ABC news article dataset and AAN conference paper dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages