Simple python implementation of two popular word embedding algorithm: Word2vec and GloVe.
- The project is only for educational purposes.
- The word2vec code is build under the instructions of cs224n assigment #1.
- The glove implementation is followed along with Hans blog.
- The existing dataset in this project is SST(Stanford Sentiment Treebank)
- SST contain sentiment analysis labels which can be used to evaluating the pros & cons of each embedding model.
- Install the dependencies (Python2.7)
pip install -r requirement.txt
- Download dataset
sh get_datasets.sh
- train word2vec
python train.py -m word2vec --save-every=True --vector-path=./model/word2vec -s 10 --learning-rate=0.3 -w 5 --iterations=40000
- train glove
python train.py -m glove -s 50 --learning-rate=0.05 --iterations=200 --save-every=True --vector-path=./model/glove