Skip to content

This repository contains the tensorflow implementation and models for DAN - CVPR 2017 paper

Notifications You must be signed in to change notification settings

JunweiLiang/DualAttentionNetwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dual attention network

This repository contains the code (using Tensorflow) and models for this CVPR 2017 paper (image-to-text and text-to-image task):

Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 
"Dual attention networks for multimodal reasoning and matching." 
in Proc. CVPR 2017

Thanks to instructions from the author (Hyeonseob Nam), I was able to reproduce the number reported in the paper on Flickr30k:

Image-to-Text Text-to-Image
Method R@1 R@5 R@10 MR R@1 R@5 R@10 MR
DAN Paper 55.0 81.8 89.0 1 39.4 69.2 79.1 2
This Implementation 54.4 82.4 89.9 1.0 39.8 71.4 80.9 2

Dependencies

  • Python 2.7; TensorFlow >= 1.4.0; tqdm and nltk (for preprocessing)
  • Flickr30k Images and Text
  • Dataset splits from here. This split is the same as m-RNN.
  • Pretrained Resnet-152 Model from Tensorpack

Training

  1. Get Resnet feature
$ python resnet-extractor/extract.py flickr30k_images/ ImageNet-ResNet152.npz resnet-152 --batch_size 20 --resize 448 --depth 152
  1. Preprocess
$ python prepro_flickr30k.py splits/ results_20130124.token prepro --noword2vec --noimgfeat
  1. Training

I use a slightly different training schedule. Batch size 256, learning rate 0.1 and 0.5 dropout for the first 60 epochs and 0.8 dropout and learning rate 0.05 for the next epochs. Also I use Adadelta as optimizer. It will take up to 9GB GPU memory and train for about 50 hours with SSDs.

(There are other options (--use_char, --concat, etc.) I haven't tried with hard negative mining yet.)

$ python main.py prepro models dan --no_wordvec --word_emb_size 512 --num_hops 2 --word_count_thres 1 --sent_size_thres 200 --word_size_thres 20 --hidden_size 512 --keep_prob 0.5 --margin 100 --num_epochs 60 --save_period 1000 --batch_size 256 --clip_gradient_norm 0.1 --init_lr 0.1 --wd 0.0005 --featpath resnet-152/ --feat_dim 14,14,2048 --hn_num 32 --is_train
  1. Testing with the model You can download my model and put it in models/00/dan/best/ to directly run it. Also put shared.p in models/00/dan/
$ python main.py prepro models dan --no_wordvec --word_emb_size 512 --num_hops 2 --word_count_thres 1 --sent_size_thres 200 --word_size_thres 20 --hidden_size 512 --keep_prob 0.5 --margin 100 --num_epochs 60 --save_period 1000 --batch_size 256 --clip_gradient_norm 0.1 --init_lr 0.1 --wd 0.0005 --featpath resnet-152/ --feat_dim 14,14,2048 --hn_num 32 --is_test --load_best

About

This repository contains the tensorflow implementation and models for DAN - CVPR 2017 paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages