Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

yolo-wolrd-l 在 coco上finetune 无法成功复现 #76

Open
Hudaodao99 opened this issue Feb 26, 2024 · 7 comments
Open

yolo-wolrd-l 在 coco上finetune 无法成功复现 #76

Hudaodao99 opened this issue Feb 26, 2024 · 7 comments
Labels
bug Something isn't working Working on it now!

Comments

@Hudaodao99
Copy link

Hudaodao99 commented Feb 26, 2024

您好! 我配置了环境想复现yolo_world_l在coco上的finetune结果,但实际跑出来的结果介于s和m模型结果的中间。
过程详细按照finetune的文档进行实验:

  1. 由于加入efficient neck的yolo_world_l代码网页报错404,此次复现使用的是未加入efficient neck的yolo_world_l。
  2. 论文仅给出L的finetune模型在O365,GoldG,CC3M上进行pretrain。为了完全复现,将原文档中的加载权重load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth'改为load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth'
  3. 超参数:8a800*16bs
  4. 训练过程及结果:
    02/23 21:54:41 - mmengine - INFO - Epoch(train) [80][800/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:01:05 time: 0.5054 data_time: 0.0021 memory: 18624 grad_norm: 886.4797 loss: 350.3436 loss_cls: 97.6750 loss_bbox: 112.2129 loss_dfl: 140.4556
    02/23 21:55:06 - mmengine - INFO - Epoch(train) [80][850/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:39 time: 0.4935 data_time: 0.0021 memory: 18717 grad_norm: 887.4158 loss: 365.0416 loss_cls: 104.4143 loss_bbox: 117.6048 loss_dfl: 143.0225
    Corrupt JPEG data: premature end of data segment
    02/23 21:55:31 - mmengine - INFO - Epoch(train) [80][900/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:13 time: 0.5056 data_time: 0.0021 memory: 18637 grad_norm: 881.9796 loss: 357.7269 loss_cls: 102.4538 loss_bbox: 112.8057 loss_dfl: 142.4673
    02/23 21:57:06 - mmengine - INFO - bbox_mAP_copypaste: 0.486 0.656 0.530 0.309 0.534 0.640
    02/23 21:57:06 - mmengine - INFO - Epoch(val) [80][625/625] coco/bbox_mAP: 0.4860 coco/bbox_mAP_50: 0.6560 coco/bbox_mAP_75: 0.5300 coco/bbox_mAP_s: 0.3090 coco/bbox_mAP_m: 0.5340 coco/bbox_mAP_l: 0.6400 data_time: 0.0003 time: 0.0170

请问有什么关于finetune复现方面的建议及指导吗?另外,期待finetune部分的完善及权重的更新~

@HGao-cv
Copy link

HGao-cv commented Feb 28, 2024

您好! 我配置了环境想复现yolo_world_l在coco上的finetune结果,但实际跑出来的结果介于s和m模型结果的中间。 过程详细按照finetune的文档进行实验:

  1. 由于加入efficient neck的yolo_world_l代码网页报错404,此次复现使用的是未加入efficient neck的yolo_world_l。
  2. 论文仅给出L的finetune模型在O365,GoldG,CC3M上进行pretrain。为了完全复现,将原文档中的加载权重load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth'改为load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth'
  3. 超参数:8a800*16bs
  4. 训练过程及结果:
    02/23 21:54:41 - mmengine - INFO - Epoch(train) [80][800/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:01:05 time: 0.5054 data_time: 0.0021 memory: 18624 grad_norm: 886.4797 loss: 350.3436 loss_cls: 97.6750 loss_bbox: 112.2129 loss_dfl: 140.4556
    02/23 21:55:06 - mmengine - INFO - Epoch(train) [80][850/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:39 time: 0.4935 data_time: 0.0021 memory: 18717 grad_norm: 887.4158 loss: 365.0416 loss_cls: 104.4143 loss_bbox: 117.6048 loss_dfl: 143.0225
    Corrupt JPEG data: premature end of data segment
    02/23 21:55:31 - mmengine - INFO - Epoch(train) [80][900/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:13 time: 0.5056 data_time: 0.0021 memory: 18637 grad_norm: 881.9796 loss: 357.7269 loss_cls: 102.4538 loss_bbox: 112.8057 loss_dfl: 142.4673
    02/23 21:57:06 - mmengine - INFO - bbox_mAP_copypaste: 0.486 0.656 0.530 0.309 0.534 0.640
    02/23 21:57:06 - mmengine - INFO - Epoch(val) [80][625/625] coco/bbox_mAP: 0.4860 coco/bbox_mAP_50: 0.6560 coco/bbox_mAP_75: 0.5300 coco/bbox_mAP_s: 0.3090 coco/bbox_mAP_m: 0.5340 coco/bbox_mAP_l: 0.6400 data_time: 0.0003 time: 0.0170

请问有什么关于finetune复现方面的建议及指导吗?另外,期待finetune部分的完善及权重的更新~

Hello, do you use mask-refine when fine-tuning on coco? If I don't use mask-refine, I can't get normal results. Have you ever encountered similar problems?

@wondervictor
Copy link
Collaborator

@Hudaodao99 @HGao-cv 我这边重新检查一下finetune的config

@Sally-lxy
Copy link

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

@Hudaodao99
Copy link
Author

Hudaodao99 commented Feb 29, 2024

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

你好,我刚尝试用 configs/finetune_coco/yolo_world_l_efficient_neck_2e-4_80e_8gpus_mask-refine_finetune_coco.py的config进行finetune,达到了论文指标。但原文件的权重文件是没有加入cc3m的pretain(这个结果在论文中没有提到),仍旧可以达到map=53.3,这是我疑惑的点。

@Hudaodao99
Copy link
Author

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

@Hudaodao99 你好,我也遇到了同样的问题。同配置coco上微调,达不到论文中的精度,两种微调方式的mAP50比论文低4.2、3.7。

您好! 我配置了环境想复现yolo_world_l在coco上的finetune结果,但实际跑出来的结果介于s和m模型结果的中间。 过程详细按照finetune的文档进行实验:

  1. 由于加入efficient neck的yolo_world_l代码网页报错404,此次复现使用的是未加入efficient neck的yolo_world_l。
  2. 论文仅给出L的finetune模型在O365,GoldG,CC3M上进行pretrain。为了完全复现,将原文档中的加载权重load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth'改为load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_cc3mlite_train_pretrained-7a5eea3b.pth'
  3. 超参数:8a800*16bs
  4. 训练过程及结果:
    02/23 21:54:41 - mmengine - INFO - Epoch(train) [80][800/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:01:05 time: 0.5054 data_time: 0.0021 memory: 18624 grad_norm: 886.4797 loss: 350.3436 loss_cls: 97.6750 loss_bbox: 112.2129 loss_dfl: 140.4556
    02/23 21:55:06 - mmengine - INFO - Epoch(train) [80][850/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:39 time: 0.4935 data_time: 0.0021 memory: 18717 grad_norm: 887.4158 loss: 365.0416 loss_cls: 104.4143 loss_bbox: 117.6048 loss_dfl: 143.0225
    Corrupt JPEG data: premature end of data segment
    02/23 21:55:31 - mmengine - INFO - Epoch(train) [80][900/925] base_lr: 2.0000e-04 lr: 6.9500e-06 eta: 0:00:13 time: 0.5056 data_time: 0.0021 memory: 18637 grad_norm: 881.9796 loss: 357.7269 loss_cls: 102.4538 loss_bbox: 112.8057 loss_dfl: 142.4673
    02/23 21:57:06 - mmengine - INFO - bbox_mAP_copypaste: 0.486 0.656 0.530 0.309 0.534 0.640
    02/23 21:57:06 - mmengine - INFO - Epoch(val) [80][625/625] coco/bbox_mAP: 0.4860 coco/bbox_mAP_50: 0.6560 coco/bbox_mAP_75: 0.5300 coco/bbox_mAP_s: 0.3090 coco/bbox_mAP_m: 0.5340 coco/bbox_mAP_l: 0.6400 data_time: 0.0003 time: 0.0170

请问有什么关于finetune复现方面的建议及指导吗?另外,期待finetune部分的完善及权重的更新~

Hello, do you use mask-refine when fine-tuning on coco? If I don't use mask-refine, I can't get normal results. Have you ever encountered similar problems?

Yes, if I use mask-refine when fine-tuning on coco, I can get the same results as on github. if not, map = 0.486, it's lower.

@wondervictor
Copy link
Collaborator

@Sally-lxy @Hudaodao99 目前问题看下来应该是mask-refine带来的数据增强问题,我们之前忽略了w/o mask-refine的实验,我这边将尽快测试一下w/o mask-refine的性能,并重新检查下开源版本的一系列configs,请稍等片刻。

@wondervictor wondervictor added bug Something isn't working Working on it now! labels Mar 19, 2024
@wondervictor
Copy link
Collaborator

Hi all (@Sally-lxy, @Hudaodao99), we have explored the errors about pre-training without maks-refine and fixed this issue preliminarily. With mask-refine, YOLO-World performs significantly better than the paper version. Without mask-refine, YOLO-World still obtains competitive performance, e.g., YOLO-World-L obtains 52.8 AP on COCO.

You can find more details in configs/finetune_coco, especially for the version without mask-refine.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working Working on it now!
Projects
None yet
Development

No branches or pull requests

4 participants