-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bad performance on ImageNet variants #15
Comments
Hi, But thanks for sharing these numbers. Could you also compare them with standard finetuning if you have those numbers as well. |
It appears that weight ensemble is not applied in the current code. I cannot see the line where args.alpha is used. As mentioned in the comment, I will also experiment with ViT-B/16. |
The results should be much better even without weight ensembling. I am not really sure about what the baseline for say standard cross entropy finetuning would be, but still FLYP should give better OOD accuracies than zeroshot (even without ensembling). I used the ensembling code from https://github.com/mlfoundations/wise-ft. Let me know the ViTB/16 numbers once you get them and we can debug then |
Hi, |
Sorry for the late reply, I ran the FLYP with CLIP ViT/16 without ensembling (i.e., WiSE-FT) and ImageNet Top-1 accuracy: 82.4 Relatively robustness is maintained compared to the experiment using ViT-B/32, but it seems to be lower overall than the performance reported in the paper (Avg OOD ours: 58.9, Reported: 60.2), especially ImageNet-R, A, and ObjectNet. In particular, the score I obtained is not much different from the zero-shot OOD performance, and robustness is maintained. What should I modify in my experiment to get the reported score? |
Did you use the CLI arguments in the readme? Can you please send me your logs. |
Sorry for the late reply. Here are the arguments. Below are the logs. ObjectNet dataset is bigger than other datasets, so I only do evaluate on ObjectNet after 8 epochs. 2023-10-23,14:31:17 | INFO | Train Epoch: 0 [ 512/1281167 (0%)] Data (t): 0.000 Batch (t): 5.934, 29.8708/s LR: 0.000000 Loss: 1.7685 (1.7685) ... 2023-10-23,19:08:39 | INFO | Train Epoch: 9 [ 512/1281167 (0%)] Data (t): 0.000 Batch (t): 1.621, 121.220/s LR: 0.000001 Loss: 0.38631 (0.38631) |
I ran the FLYP code to compare with "Masked Images Are Counterfactual Samples for Robust Fine-tuning, CVPR 2023", using ViT-B/32 model.
I expect that FLYP can be competitive with other methods, but the performance on OOD datasets of model trained with FLYP is significantly degraded.
Zero-shot CLIP performance using ViT-B/32 is the following:
ImageNet Top-1 accuracy: 63.4
ImageNetV2 Top-1 accuracy: 55.9
ImageNetR Top-1 accuracy: 69.3
ImageNetSketch Top-1 accuracy: 42.3
ImageNetA Top-1 accuracy: 31.4
I ran just one epoch training with FLYP, but its performance is:
ImageNet Top-1 accuracy: 73.3
ImageNetV2 Top-1 accuracy: 62.6
ImageNetR Top-1 accuracy: 63.1
ImageNetSketch Top-1 accuracy: 40.9
ImageNetA Top-1 accuracy: 25.9
FLYP cannot preserve the robustness, and the performances on ImageNet-R, ImageNet Sketch, and ImageNet-A are dropped compared to Zero-shot CLIP, even just trained for an epoch. I use the same parameters that are used in training for ViT-B/16 experiments.
Can you clarify this phenomenon? Are there any wrong things in this experiment?
The text was updated successfully, but these errors were encountered: