Exploration into Translation-Equivariant Image Quantization

Woncheol Shin¹, Gyubok Lee¹, Jiyoung Lee¹, Eunyi Lyou³, Joonseok Lee^2,3, Edward Choi¹ | Paper

¹KAIST, ²Google Research, ³Seoul National University

Abstract

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing. Instead of focusing on anti-aliasing, we propose a simple yet effective way to achieve translation-equivariant image quantization by enforcing orthogonality among the codebook embeddings. To explore the advantages of translation-equivariant image quantization, we conduct three proof-of-concept experiments with a carefully controlled dataset: (1) text-to-image generation, where the quantized image indices are the target to predict, (2) image-to-text generation, where the quantized image indices are given as a condition, (3) using a smaller training set to analyze sample efficiency. From the strictly controlled experiments, we empirically verify that the translation-equivariant image quantizer improves not only sample efficiency but also the accuracy over VQGAN up to +11.9% in text-to-image generation and +3.9% in image-to-text generation.

Requirements

conda env create -f environment.yaml
conda activate bidalle
pip install horovod==0.22.1

If you fail to install horovod, please refer to here.

Download Dataset

bash download_mnist64x64_stage2.sh

Download Image Classifier

bash download_classifier_ckpt.sh

Training Bi-directional Image-Text Generator (Stage 2)

In run_train_dalle.sh, you should specify --vqgan_model_path and --vqgan_config_path. Provide your model path pretrained from TE-VQGAN. For example,

--vqgan_model_path /home/TE-VQGAN/logs/2022-04-01T07-37-39_mnist64x64_vqgan/checkpoints/last.ckpt \
--vqgan_config_path /home/TE-VQGAN/logs/2022-04-01T07-37-39_mnist64x64_vqgan/configs/2022-04-01T07-37-39-project.yaml

And then run the script:

bash run_train_dalle.sh

Citation

@inproceedings{shin2023exploration,
  title={Exploration into translation-equivariant image quantization},
  author={Shin, Woncheol and Lee, Gyubok and Lee, Jiyoung and Lyou, Eunyi and Lee, Joonseok and Choi, Edward},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Acknowledgments

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
BPE		BPE
dalle_pytorch		dalle_pytorch
img_classifier		img_classifier
taming		taming
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_classifier_ckpt.sh		download_classifier_ckpt.sh
download_mnist64x64_stage2.sh		download_mnist64x64_stage2.sh
environment.yaml		environment.yaml
parser.py		parser.py
run_train_dalle.sh		run_train_dalle.sh
train_dalle.py		train_dalle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploration into Translation-Equivariant Image Quantization

Abstract

Requirements

Download Dataset

Download Image Classifier

Training Bi-directional Image-Text Generator (Stage 2)

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

wcshin-git/Bidirectional_DALLE

Folders and files

Latest commit

History

Repository files navigation

Exploration into Translation-Equivariant Image Quantization

Abstract

Requirements

Download Dataset

Download Image Classifier

Training Bi-directional Image-Text Generator (Stage 2)

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages