I meet a problem evaluating cross-architecture performance. #3

rosean2002 · 2024-10-09T08:13:27Z

Hi, I am doing some experiments on this task. I tried to modify the code myself to evaluate the dataset distilled by nfnet on nf-resnet50, but the accuracy was very poor, far from the performance listed in the paper. I would like to ask how the cross-architecture performance is evaluated here. Are there any special settings?

silicx · 2024-10-11T06:34:48Z

Hi, thanks for trying our code! So what kind of accuracy are you seeing? I've run a fast evaluation today and the result looks normal:

image_model_train = nf_resnet50, text_model_train = bert, iteration = ?
  0%|                               | 0/101 [00:00<?, ?it/s]Evaluation time 0:00:04
[Eval_00] Ep0 | Image R@1=0.34 R@5=1.64 R@10=2.68 | Text R@1=2.10 R@5=7.60 R@10=12.40 | Mean=4.46
 10%|██▏                   | 10/101 [00:23<01:53,  1.25s/it]Evaluation time 0:00:03
[Eval_00] Ep10 | Image R@1=2.38 R@5=10.06 R@10=16.80 | Text R@1=6.00 R@5=14.60 R@10=23.30 | Mean=12.19
 20%|████▎                 | 20/101 [00:37<01:32,  1.14s/it]Evaluation time 0:00:03
[Eval_00] Ep20 | Image R@1=1.58 R@5=6.94 R@10=12.92 | Text R@1=5.40 R@5=16.60 R@10=26.10 | Mean=11.59
 30%|██████▌               | 30/101 [00:52<01:21,  1.14s/it]Evaluation time 0:00:03
[Eval_00] Ep30 | Image R@1=2.68 R@5=10.52 R@10=18.20 | Text R@1=6.00 R@5=18.70 R@10=27.50 | Mean=13.93
 40%|████████▋             | 40/101 [01:07<01:09,  1.14s/it]Evaluation time 0:00:03
[Eval_00] Ep40 | Image R@1=2.16 R@5=8.20 R@10=14.46 | Text R@1=6.30 R@5=18.00 R@10=27.70 | Mean=12.80
 50%|██████████▉           | 50/101 [01:21<00:58,  1.15s/it]Evaluation time 0:00:03
[Eval_00] Ep50 | Image R@1=2.08 R@5=6.78 R@10=11.32 | Text R@1=5.80 R@5=16.00 R@10=25.80 | Mean=11.30
 59%|█████████████         | 60/101 [01:36<00:47,  1.17s/it]Evaluation time 0:00:03
[Eval_00] Ep60 | Image R@1=3.26 R@5=11.80 R@10=19.10 | Text R@1=6.20 R@5=18.30 R@10=29.10 | Mean=14.63
 69%|███████████████▏      | 70/101 [01:51<00:35,  1.14s/it]Evaluation time 0:00:03
[Eval_00] Ep70 | Image R@1=2.58 R@5=10.88 R@10=18.26 | Text R@1=7.00 R@5=18.80 R@10=29.60 | Mean=14.52
 79%|█████████████████▍    | 80/101 [02:06<00:24,  1.15s/it]Evaluation time 0:00:03
[Eval_00] Ep80 | Image R@1=3.00 R@5=12.02 R@10=19.70 | Text R@1=6.50 R@5=20.00 R@10=29.90 | Mean=15.19
 89%|███████████████████▌  | 90/101 [02:21<00:12,  1.15s/it]Evaluation time 0:00:03
[Eval_00] Ep90 | Image R@1=3.50 R@5=12.10 R@10=20.10 | Text R@1=6.10 R@5=20.10 R@10=30.10 | Mean=15.33
 99%|████████████████████▊| 100/101 [02:36<00:01,  1.16s/it]Evaluation time 0:00:03
[Eval_00] Ep100 | Image R@1=3.32 R@5=12.20 R@10=19.36 | Text R@1=5.80 R@5=20.20 R@10=30.10 | Mean=15.16

silicx · 2024-10-11T06:39:26Z

And I've upload my evaluation code (with eval command at the beginning) in 8a712af for your reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I meet a problem evaluating cross-architecture performance. #3

I meet a problem evaluating cross-architecture performance. #3

rosean2002 commented Oct 9, 2024

silicx commented Oct 11, 2024

silicx commented Oct 11, 2024

I meet a problem evaluating cross-architecture performance. #3

I meet a problem evaluating cross-architecture performance. #3

Comments

rosean2002 commented Oct 9, 2024

silicx commented Oct 11, 2024

silicx commented Oct 11, 2024