Clarification of the BPD results on ImageNet32/ImageNet64 #7

zhengkw18 · 2023-11-29T06:46:12Z

Congratulations on your good work! I think DenseFlow is the SOTA among normalizing flows, but I would like to make some clarifications regarding its comparison with other methods (such as diffusion models).

I was comparing DenseFlow against VDM on ImageNet64x64.

DenseFlow: 3.35 BPD, 130M, 1 V100 ~2 weeks
VDM: 3.4 BPD, ?M, 128 TPUv3 for ?weeks?

It looks like DenseFlow gets better BPD with ~100x less compute,

I think the reason why DenseFlow has such a good BPD on ImageNet32/ImageNet64 with distinctly lower computational cost is that the wrong version of downsampled ImageNet was used. I have recently uploaded the code of our ICML2023 paper Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs (https://github.com/thu-ml/i-DODE), where this question is emphasized as:

There are two different versions of ImageNet32 dataset. For fair comparisons, we use both versions of ImageNet32, one is downloaded from https://image-net.org/data/downsample/Imagenet32_train.zip, following Flow Matching [3], and the other is downloaded from http://image-net.org/small/train_32x32.tar (old version, no longer available), following ScoreSDE and VDM. The former dataset applies anti-aliasing and is easier for maximum likelihood training.

Clearly, DenseFlow chose the new version of ImageNet32/64 (https://github.com/matejgrcic/DenseFlow/blob/473220a9c02b262b481fbaa50a947e40bad3f99c/denseflow/data/datasets/image/imagenet32.py), which is in favor of the BPD. Therefore, I suggest the author clarify this and remove the BPD result from the rank list (https://paperswithcode.com/paper/densely-connected-normalizing-flows), where other methods are using the old version ImageNet and the comparison is unfair and confusing.

The text was updated successfully, but these errors were encountered:

zhengkw18 · 2023-11-29T07:01:28Z

We conducted experiments on both versions of ImageNet32, and found that the new version typically results in about 0.3 lower BPD than the old version: 3.43 (new version, batch size 128, A40 GPU) vs. 3.69 (old version, batch size 512, A100 GPU). So the dataset difference is rather notable.

It seems that Efficient-VDVAE on https://paperswithcode.com/sota/image-generation-on-imagenet-64x64 also uses the wrong version of ImageNet and leads to unfair comparison.

Under fair comparison, VDM is still the current SOTA likelihood model on CIFAR10/ImageNet32/ImageNet64.

matejgrcic · 2023-12-17T20:29:45Z

Hi, thanks for pointing out the mismatch between the two versions of IN32. As far as I know, this is mostly unknown in the community and the old version being unavailable doesn't help. I will update the README so that it is more clear that we trained on the new version of IN32. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of the BPD results on ImageNet32/ImageNet64 #7

Clarification of the BPD results on ImageNet32/ImageNet64 #7

zhengkw18 commented Nov 29, 2023

zhengkw18 commented Nov 29, 2023 •

edited

Loading

matejgrcic commented Dec 17, 2023 •

edited

Loading

Clarification of the BPD results on ImageNet32/ImageNet64 #7

Clarification of the BPD results on ImageNet32/ImageNet64 #7

Comments

zhengkw18 commented Nov 29, 2023

zhengkw18 commented Nov 29, 2023 • edited Loading

matejgrcic commented Dec 17, 2023 • edited Loading

zhengkw18 commented Nov 29, 2023 •

edited

Loading

matejgrcic commented Dec 17, 2023 •

edited

Loading