#Text Analysis and Machine Learning
These is a Simple Study purpose project of Text Analysis and ML. It has a simple task initially to generate Graphical representation of related entity words given in the given unstructured text. In the process I first learn how individual word get extracted and removal of suffix is important in finding the meaning form stemmed root word. But Than I find a algorithm which is word2vec, and I find it very useful to learn. So I tend to learn it first.
The Concept I learn till now:
Task 1 --
- Word extraction
- Stemming
- Word to Vector (Initial Part)
Task 2 --
- Implement the SVD algorithm
- Add the important part of the project. The Single Value Decomposition of the word co-occurrence matrix.
- think their might be pseudo relations because of not removing stopwords, I considered stopwords which are in NLTK kit.
Next Task are
-
Create a CBOW and Skip-gram Algorithm
-
Implement Word to Vector using SVD in CBOW and Skip-gram algorithm
-
Future Vision :-*
-
As the main notion of these study generate a graphical representation from unstructured text between real word Entity.
-
For graphical view use D3.js with JSON conversion of my C++ API
-
Create a site, a portal, which upload a text file from user and generate Graphical representation on the website.