- glove vector lib download needed.
. locate txt file under data/glove.6B/ - glove vector Data file (download)
wget http://nlp.stanford.edu/data/glove.6B.zip
- step1 : make it work on python3.5 and tf1.1 (done)
- step2 : test korean with mecab tockenizer (done)
- step3 : develop whole process on hoyai project with rest webservice
. API Client jupyter code here
. reset server github link
. start service with docker
Check original projects blog post
- Simple WebCrawler : gather data from korean wiki pedia for w2v train
- w2v, fasttext embedding model : provide custom train for model
- korean char divide func : divide Korean char into smaller pieces for train
- BI-LSTM CRF for NER : implement it with tensorflow
- Attention seq2seq for NER : working on it (~ing)
- Train Data Example: link
. you have to preprocess data first before use it - Word Level Test
['한화증권']['OG']
['김승우']['PS']
['김수상']['PS']
['6시30분']['DT']
- Sentence Level Test
[['6시30분']['한화건설']['김승우']['약속']]
[['DT']['OG']['PS']['OO']]