This project is part of an assignment of the Object Recognition and Computer Vision course that I've followed at the Ens Paris-Saclay (master MVA). The goal was to implement the models of the following article https://arxiv.org/pdf/1505.00468.pdf and to discuss both quantitavely and qualitavely the results obtain.
If you want to use this code on your own personal computer you must have a CUDA compatible GPU (nvidia GTX). You must also have enough RAM on your computer (8 GB would be enough). As I've implemented the code for windows, the following installation instructions are for windows users.
-
Install CUDA 8.0 for your operating system (At the time I'm writing this note, tensorflow is not compatible with higher version on windows environment)
-
Launch the installer and install CUDA
-
Download cuDNN 5.1 for Windows (At the time I'm writing these lines, tensorflow only supports cuDNN 5.1 for windows users)
-
Unzip the archive. You should get a folder containing 3 other folders:
- bin
- include
- lib
-
Go to
C:\
and create a folder namedCuda
, then copy and paste the foldersbin
,include
,lib
inside yourCuda
folder. -
Add
C:\Cuda
to your Path environment variable. To do so:- Right click on
Windows -> System -> Advanced system settings (on the left) -> Environment Variables
- Click on
Path Variable
underSystem Variables
and then clickEdit...
and Add;C:\Cuda
at the end of thePath
variable. (On windows 10 you just have to addC:\Cuda
on a new line).
- Right click on
-
Download and install Anaconda with python 3.6 (x64). Once Anaconda is installed, open an anaconda prompt and type:
pip install --ignore-installed --upgrade tensorflow-gpu
-
Install spacy: open an anaconda prompt with ADMIN RIGHT and type:
python -m spacy download en_vectors_web_lg
-
Download and install Graphviz then add Graphviz to your environment variable
Path
:Computer > Properties > Advanced system settings > Environment Variables
and add;C:\Program Files (x86)\Graphviz2.38\bin
(your path can be different) -
Reinstall pip (known bug that after installing tensorflow, pip is broken...) by typing in an anaconda prompt:
conda install pip
-
Install keras by typing in an anaconda prompt either
conda install keras
or
pip install keras
-
Install pydot-ng (this will allow you to see the architecture of your neural network) by typing in an anaconda prompt:
pip install pydot-ng
Folders
- Annotations: unzip the content of this file in Annotations folder (Annotations folder should contains 6 json files)
- COCO
- annotations: unzip the content of these file 1, file 2, file 3 in this folder. (The folder should contains only json files)
- images: images folder contains both validation and training images from the COCO dataset. The training images are available here and the validation images are available here. (This folder should contains only jpeg images)
- images_test: images_test folder contains the testing images from the COCO dataset. These images can be downloaded here. (This folder should only contains jpeg images)
- histories:
- BOWQ_I: contains training and validation accuracy/loss at each epoch. This file is created when executing
BOWQ_I.py
. - LSTM_Q: contains training and validation accuracy/loss at each epoch. This file is created when executing
LSTM_Q.py
. - LSTMQ_I: contains training and validation accuracy/loss at each epoch. This file is created when executing
LSTMQ_I.py
. - plots.ipynb: jupyter notebook that plots the training and validation accuracy/loss for all models (BOWQ_I, LSTM_Q and LSTMQ_I)
- BOWQ_I: contains training and validation accuracy/loss at each epoch. This file is created when executing
- models: contains
vgg16.py
a python script that allows to recover the fc7 layer of the VGG16 neural network (the fc7 leayer corresponds to the features computed on the images. These features are vectors of size 4096) - our_images: put your on images here if you want to test the model on your own images (Note: the code to test on your on image is already provided for each model in the
online_modelname.ipynb
where modelname is eitherLSTM_Q
,LSTMQ_I
,BOWQ_I
) - preprocess_datas:
- the file contains in this folder are created by the scripts:
create_dict.ipynb
,features_extractor.ipynb
andcreate_all_answers.ipynb
(Note: you should execute these scripts to recreate the files of this directory as they are to heavy to let me upload them on github)
- the file contains in this folder are created by the scripts:
- Questions: this folder contains json files that encode the questions and answers. These files can be downloaded from here (Note: you can also download the v2 version available here, but in the experiment I've focus on the v1 version)
- weights: this folder contains the following directories:
- BOWQ_I: contains the weights of the neural-network saved at each epoch. These weights are generated while executing the python script:
BOWQ_I.py
- LSTM_Q: contains the weights of the neural-network saved at each epoch. These weights are generated while executing the python script:
LSTM_Q.py
- LSTMQ_I: contains the weights of the neural-network saved at each epoch. These weights are generated while executing the python script:
LSTMQ_I.py
- BOWQ_I: contains the weights of the neural-network saved at each epoch. These weights are generated while executing the python script:
Scripts
- baselines.ipynb: this script implements the baselines of the article (random, qPrior, per q-Prior, nearest-neigbhors)
- BOWQ_I.py: this script train the Bag-of-Word + Image feature model. Just type
python BOWQ_I.py
in your shell to execute the code (Note: you can change the model, the number of epochs and so on by editing the python file) - LSTM_Q.py: this script train the Bag-of-Word + Image feature model. Just type
python LSTM_Q.py
in your shell to execute the code (Note: you can change the model, the number of epochs and so on by editing the python file) - LSTMQ_I.py: this script train the Bag-of-Word + Image feature model. Just type
python LSTMQ_I.py
in your shell to execute the code (Note: you can change the model, the number of epochs and so on by editing the python file) create_all_answers.ipynb
,create_dict.ipynb
,features_extractor.ipynb
are scripts that you should execute before any other scripts as it will preprocess the data and create files.features_processor.py
andutils.py
contains functions that are used by other scripts. All the functions are documented by docstrings so you can understand what is the purpose of each functions.online_modelname.ipynb
are jupyter notebook that allows you to test the model on either the testing images or your own images. These file also provide the final accuracy of each model computed on the validation set.cb.py
: this script is used inLSTM_Q.py
,LSTM_Q.py
andLSTMQ_I.py
to save the training and validation accuracy/loss at each epochs. It is a callback.
The code can be improved in many ways. I've made the choice to use function rather than files to preprocess the data as it is easier if we want to change the settings of the neural network. For example to use K=2000 top answers instead of K=1000 top answers (see the paper to understand the meaning), one can just use
topKFrequentAnswer(data_q, data_a, data_qval, data_aval, K=2000)
with K=2000 to preprocess the data.