Deeplearning4j - OnDevice (Mobile) Word2Vec Training

Word2Vec Porting On Android Using DeepLearning4j

Model training takes place on mobile and creates a word2vec model.

Once training is done - word2vec.wordnearestsum returns nearest trained wordvector.

CHOOSE HIGH END DEVICE FOR RUNNING THE CODE SUCCESSFULLY- [ Tested Device - Samsung Galaxy S7 ]

File Folders

assets/

vector_data1.txt

vector_data2.txt

vector_data3.txt

vector_data4.txt

raw/

stopwords.txt

extended_stopwords.txt

layout/

activity_main.xml

main/java/

MainActivity.java

WordVectorReader.java

WordVectorSaver.java

WordVectorTraining.java

Storage Permission

Manually Enable Permission from App Settings For the First Time or Handle it from code.

Delete W2V_DATAPATH/data for training the model from the beginning.

Manually delete trained dataset for including new training data or go for uptraining process.

Running word2vec algorithm on mobile getting ANR in Base64 library due to java and android compatibility

WordVectorReader and WordVectorSaver, overrides the funtion of library in order to solve the dependencies

Tried to solve encodeBase64/decodeBase64 compatibility issue between android and java: auth0/java-jwt#131

Data Set Source : http://www.gutenberg.org/

If you see below issue try with high end devices

Issues mentioned occured only on low end devices.

ISSUE DISCUSSION

https://gitter.im/deeplearning4j/deeplearning4j/archives/2016/09/06

ISSUE_1

java.lang.IllegalStateException: You can't fit() model with empty Vocabulary or WeightLookupTable at org.deeplearning4j.models.sequencevectors.SequenceVectors.fit(SequenceVectors.java:238) at com.example.vijay.ondevice_word2vector.WordVectorTraining.trainW2V(WordVectorTraining.java:174) at com.example.vijay.ondevice_word2vector.MainActivity$1.run(MainActivity.java:50) at java.lang.Thread.run(Thread.java:764)

ISSUE_2

2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: pid: 30179, tid: 30284, name: VectorCalculati

com.example.vijay.ondevice_word2vector <<< 2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: Cause: null pointer dereference 2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: r0 00000000 r1 00000000 r2 00000001 r3 00000000 2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: r4 00000000 r5 00000064 r6 00000000 r7 dc0212b8 2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: r8 00000000 r9 00000000 sl 00000064 fp b15f9450 2019-02-02 21:17:14.890 30303-30303/? A/DEBUG: ip 00000001 sp b15f93d0 lr b21f1800 pc b7f5ca24 cpsr 200e0010

IDEA BEHIND TRAINING

(a) The whole idea is to train word2vec on device (mobile), mentioned issue above may not appear for high end mobile devices. (b) Code successfully run for training configuration in code and vector_data1.txt" (c) With the current vector_data1.txt, desired result is not obtained. Increasing the data causing ANR as mentioned in Issue. (d) Tune the parameters, try with various mobile devices, clear the caches, free RAM usage during training etc. for obtaining the desired result. (e) Doing training ondevice can help in securing user privacy.

Below code will work in high end device

Change or modify /asset/vector_data[*].txt for domain specific word2vec

Tune hyper parameters for best result

            word2Vec[i] = new Word2Vec.Builder()
                    .minWordFrequency(10)  /*observation  : This has major role to play for mentioned ISSUE*/
                    .iterations(1)
                    .layerSize(100)
                    .seed(42)
                    .windowSize(5)
                    .epochs(1)  /*observation  : It will take more time in training*/
                    .batchSize(10) /*Commented*/
                    .stopWords(stopwords) /*observation  : These worked for high end mobile devices*/
                    .stopWords(extendedStopwords) /*observation  : These worked for high end mobile devices*/
                    .iterate(iterator)
                    .tokenizerFactory(tokenizerFactory)
                    .lookupTable(table) /*Commented*/
                    .vocabCache(cache) /*Commented*/
                    .elementsLearningAlgorithm(new SkipGram<VocabWord>())
                    .build();

Read Existing Word2Vec Model : glove.6B.50d ( 163MB)

device : galaxy s7 and above hardware configuration (4GB RAM)

DATA_PATH = Read Exisiting word to vec model - keep "glove.6B.50d" file @W2V_DATAPATH

Reading existing model can be done on (2GB RAM) devices but training cannot be done. Once model can be trained on high end and then model can be transferred to low end for just reading operation. Totally depends on use cases.

OnDevice Sentiment Analysis :

https://github.com/vijay033/OnDevice_Sentiment-Analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deeplearning4j - OnDevice (Mobile) Word2Vec Training

Word2Vec Porting On Android Using DeepLearning4j

Model training takes place on mobile and creates a word2vec model.

Once training is done - word2vec.wordnearestsum returns nearest trained wordvector.

CHOOSE HIGH END DEVICE FOR RUNNING THE CODE SUCCESSFULLY- [ Tested Device - Samsung Galaxy S7 ]

File Folders

assets/

raw/

layout/

main/java/

Storage Permission

Delete W2V_DATAPATH/data for training the model from the beginning.

Running word2vec algorithm on mobile getting ANR in Base64 library due to java and android compatibility

WordVectorReader and WordVectorSaver, overrides the funtion of library in order to solve the dependencies

Tried to solve encodeBase64/decodeBase64 compatibility issue between android and java: auth0/java-jwt#131

Data Set Source : http://www.gutenberg.org/

If you see below issue try with high end devices

Issues mentioned occured only on low end devices.

ISSUE DISCUSSION

ISSUE_1

ISSUE_2

IDEA BEHIND TRAINING

Below code will work in high end device

Change or modify /asset/vector_data[*].txt for domain specific word2vec

Tune hyper parameters for best result

Read Existing Word2Vec Model : glove.6B.50d ( 163MB)

device : galaxy s7 and above hardware configuration (4GB RAM)

DATA_PATH = Read Exisiting word to vec model - keep "glove.6B.50d" file @W2V_DATAPATH

Reading existing model can be done on (2GB RAM) devices but training cannot be done. Once model can be trained on high end and then model can be transferred to low end for just reading operation. Totally depends on use cases.

OnDevice Sentiment Analysis :

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
MainActivity.java		MainActivity.java
README.md		README.md
Screenshot_20190203-131421_OnDevice_Word2Vector.jpg		Screenshot_20190203-131421_OnDevice_Word2Vector.jpg
WordVectorReader.java		WordVectorReader.java
WordVectorSaver.java		WordVectorSaver.java
WordVectorTraining.java		WordVectorTraining.java
activity_main.xml		activity_main.xml
build.gradle		build.gradle
extended_stopwords.txt		extended_stopwords.txt
stopwords.txt		stopwords.txt
vector_data1.txt		vector_data1.txt
vector_data2.txt		vector_data2.txt
vector_data3.txt		vector_data3.txt
vector_data4.txt		vector_data4.txt
vector_data5.txt		vector_data5.txt
vector_data6.txt		vector_data6.txt

vijay033/Deeplearning4j

Folders and files

Latest commit

History

Repository files navigation

Deeplearning4j - OnDevice (Mobile) Word2Vec Training

Word2Vec Porting On Android Using DeepLearning4j

Model training takes place on mobile and creates a word2vec model.

Once training is done - word2vec.wordnearestsum returns nearest trained wordvector.

CHOOSE HIGH END DEVICE FOR RUNNING THE CODE SUCCESSFULLY- [ Tested Device - Samsung Galaxy S7 ]

File Folders

assets/

raw/

layout/

main/java/

Storage Permission

Delete W2V_DATAPATH/data for training the model from the beginning.

Running word2vec algorithm on mobile getting ANR in Base64 library due to java and android compatibility

WordVectorReader and WordVectorSaver, overrides the funtion of library in order to solve the dependencies

Tried to solve encodeBase64/decodeBase64 compatibility issue between android and java: auth0/java-jwt#131

Data Set Source : http://www.gutenberg.org/

If you see below issue try with high end devices

Issues mentioned occured only on low end devices.

ISSUE DISCUSSION

ISSUE_1

ISSUE_2

IDEA BEHIND TRAINING

Below code will work in high end device

Change or modify /asset/vector_data[*].txt for domain specific word2vec

Tune hyper parameters for best result

Read Existing Word2Vec Model : glove.6B.50d ( 163MB)

device : galaxy s7 and above hardware configuration (4GB RAM)

DATA_PATH = Read Exisiting word to vec model - keep "glove.6B.50d" file @W2V_DATAPATH

Reading existing model can be done on (2GB RAM) devices but training cannot be done. Once model can be trained on high end and then model can be transferred to low end for just reading operation. Totally depends on use cases.

OnDevice Sentiment Analysis :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages