PureT

Implementation of End-to-End Transformer Based Model for Image Captioning [PDF/AAAI] [PDF/Arxiv] [AAAI 2022]

Implementation of PureT using pre-extracted features. 232525/PureT_F
Implementation of Image Captioning task with various miscellaneous codes (verbose). 232525/ImageCaptioning_Verbose

中文介绍请参考README_CN.md

Requirements (Our Main Enviroment)

Python 3.7.4
PyTorch 1.5.1
TorchVision 0.6.0
coco-caption
numpy
tqdm

Preparation

1. coco-caption preparation

Refer coco-caption README.md, you will first need to download the Stanford CoreNLP 3.6.0 code and models for use by SPICE. To do this, run:

cd coco_caption
bash get_stanford_models.sh

2. Data preparation

The necessary files in training and evaluation are saved in mscoco folder, which is organized as follows:

mscoco/
|--feature/
    |--coco2014/
       |--train2014/
       |--val2014/
       |--test2014/
       |--annotations/
|--misc/
|--sent/
|--txt/

where the mscoco/feature/coco2014 folder contains the raw image and annotation files of MSCOCO 2014 dataset. You can download other files from GoogleDrive or 百度网盘(提取码: hryh).

NOTE: You can also extract image features of MSCOCO 2014 using Swin-Transformer or others and save them as ***.npz files into mscoco/feature for training speed up, refer to coco_dataset.py and data_loader.py for how to read and prepare features. In this case, you need to make some modifications to pure_transformer.py (delete the backbone module). For you smart and excellent people, I think it is an easy work.

Training

Note: our repository is mainly based on JDAI-CV/image-captioning, and we directly reused their config.yml files, so there are many useless parameter in our model. （waiting for further sorting）

1. Training under XE loss

Download pre-trained Backbone model (Swin-Transformer) from GoogleDrive or 百度网盘(提取码: hryh) and save it in the root directory.

Before training, you may need check and modify the parameters in config.yml and train.sh files. Then run the script:

# for XE training
bash experiments_PureT/PureT_XE/train.sh

2. Training using SCST (self-critical sequence training)

Copy the pre-trained model under XE loss into folder of experiments_PureT/PureT_SCST/snapshot/ and modify config.yml and train.sh files. Then run the script:

# for SCST training
bash experiments_PureT/PureT_SCST/train.sh

Evaluation

You can download the pre-trained model from GoogleDrive or 百度网盘(提取码: hryh).

CUDA_VISIBLE_DEVICES=0 python main_test.py --folder experiments_PureT/PureT_SCST/ --resume 27

BLEU-1	BLEU-2	BLEU-3	BLEU-4	METEOR	ROUGE-L	CIDEr	SPICE
82.1	67.3	52.0	40.9	30.2	60.1	138.2	24.2

Reference

If you find this repo useful, please consider citing (no obligation at all):

@inproceedings{wangyiyu2022PureT,
  author       = {Yiyu Wang and
                  Jungang Xu and
                  Yingfei Sun},
  title        = {End-to-End Transformer Based Model for Image Captioning},
  booktitle    = {Proceedings of the AAAI Conference on Artificial Intelligence},
  pages        = {2585--2594},
  publisher    = {{AAAI} Press},
  year         = {2022},
  url          = {https://ojs.aaai.org/index.php/AAAI/article/view/20160}, 
  doi          = {10.1609/aaai.v36i3.20160},
}

Acknowledgements

This repository is based on JDAI-CV/image-captioning, ruotianluo/self-critical.pytorch and microsoft/Swin-Transformer.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
VisImgs		VisImgs
coco_caption @ dc0d08b		coco_caption @ dc0d08b
data/temp		data/temp
datasets		datasets
evaluation		evaluation
experiments_PureT		experiments_PureT
imgs		imgs
lib		lib
losses		losses
lr_scheduler		lr_scheduler
models		models
mscoco		mscoco
optimizer		optimizer
samplers		samplers
scorer		scorer
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
ICC分词预处理.ipynb		ICC分词预处理.ipynb
README.md		README.md
README_CN.md		README_CN.md
VisualizationDemo.ipynb		VisualizationDemo.ipynb
cal_flops.py		cal_flops.py
main.py		main.py
main_ensemble_onlinetest.py		main_ensemble_onlinetest.py
main_ensemble_test.py		main_ensemble_test.py
main_multi_gpu.py		main_multi_gpu.py
main_onlinetest.py		main_onlinetest.py
main_test.py		main_test.py
main_val.py		main_val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PureT

Requirements (Our Main Enviroment)

Preparation

1. coco-caption preparation

2. Data preparation

Training

1. Training under XE loss

2. Training using SCST (self-critical sequence training)

Evaluation

Reference

Acknowledgements

About

Releases

Packages

Languages

232525/PureT

Folders and files

Latest commit

History

Repository files navigation

PureT

Requirements (Our Main Enviroment)

Preparation

1. coco-caption preparation

2. Data preparation

Training

1. Training under XE loss

2. Training using SCST (self-critical sequence training)

Evaluation

Reference

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages