Skip to content

wcshin-git/TE-VQGAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploration into Translation-Equivariant Image Quantization

Woncheol Shin1, Gyubok Lee1, Jiyoung Lee1, Eunyi Lyou3, Joonseok Lee2,3, Edward Choi1 | Paper

1KAIST, 2Google Research, 3Seoul National University

Abstract

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing. Instead of focusing on anti-aliasing, we propose a simple yet effective way to achieve translation-equivariant image quantization by enforcing orthogonality among the codebook embeddings. To explore the advantages of translation-equivariant image quantization, we conduct three proof-of-concept experiments with a carefully controlled dataset: (1) text-to-image generation, where the quantized image indices are the target to predict, (2) image-to-text generation, where the quantized image indices are given as a condition, (3) using a smaller training set to analyze sample efficiency. From the strictly controlled experiments, we empirically verify that the translation-equivariant image quantizer improves not only sample efficiency but also the accuracy over VQGAN up to +11.9% in text-to-image generation and +3.9% in image-to-text generation.

Requirements

conda env create -f environment.yaml
conda activate te

Download Dataset

bash download_mnist64x64.sh

Training TE-VQGAN (Stage 1)

python main.py --base configs/mnist64x64_vqgan.yaml -t True --gpus 0,1 --max_epochs 40 --seed 23

To use TensorBoard,

run:

tensorboard --logdir logs --port [your_number] --bind_all

And then open your browser and go to http://localhost:[your_number]/.

Training Bi-directional Image-Text Generator (Stage 2)

Please refer to Bi-directional DALL-E.

Citation

@inproceedings{shin2023exploration,
  title={Exploration into translation-equivariant image quantization},
  author={Shin, Woncheol and Lee, Gyubok and Lee, Jiyoung and Lyou, Eunyi and Lee, Joonseok and Choi, Edward},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Acknowledgments

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch.

About

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation, Stage 1

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published