Skip to content

Latest commit

 

History

History
32 lines (21 loc) · 771 Bytes

README.md

File metadata and controls

32 lines (21 loc) · 771 Bytes

NLP-text-corpora-build

Hi, In this I have used two corpora :

  1. part of coca corpus (Corpus of Contemporary American English) it is an english language Corpus and

  2. corona virus corpus

It has 11946296 words

I performed these analyses :

1)Word frequency analysis 2)Parts of Speech tagging 3)chunking and chinking 4)Word feature extraction 5)ngrams 6)Named Entity Recognition

The outputs are attached under outputs folder The codes are attached under codes folder The corpora are attached under new corpus folder

This is the directory structure in which these are the subfolders:

*new corpus -     consists of all the .txt files of the corpus
*code       -     consists of all .py files 
*outputs    -     consists of all outputs of .py files