Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file #19

Open
hhhhhhxl opened this issue Jul 9, 2021 · 2 comments
Open

Comments

@hhhhhhxl
Copy link

hhhhhhxl commented Jul 9, 2021

I followed the steps and tried to train with 2 GPUs and got this error.
./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
Train fc_cls only.
--Dist-train--IS:False--ISout:False
Dist-train --- Not using image sampling.
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
[07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth
Traceback (most recent call last):
File "./tools/train.py", line 169, in
main()
File "./tools/train.py", line 165, in main
logger=logger)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector
_dist_train(model, dataset, cfg, validate=validate)
File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train
runner.load_checkpoint(cfg.load_from)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint
self.logger)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint
raise IOError('{} is not a checkpoint file'.format(filename))
OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file
Traceback (most recent call last):
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in
main()
File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

@hhhhhhxl
Copy link
Author

hhhhhhxl commented Jul 9, 2021

it seems this file './data/weneed/mask_r50/epoch_12.pth' is missing.

@xiaohe6
Copy link

xiaohe6 commented Jun 24, 2022

I followed the steps and tried to train with 2 GPUs and got this error. ./tools/dist_train.sh ./configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py 2

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

[07/09 11:22:20] root WARNING: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Train fc_cls only. --Dist-train--IS:False--ISout:False Dist-train --- Not using image sampling. Train fc_cls only. --Dist-train--IS:False--ISout:False Dist-train --- Not using image sampling. [07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth Traceback (most recent call last): File "./tools/train.py", line 169, in main() File "./tools/train.py", line 165, in main logger=logger) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train runner.load_checkpoint(cfg.load_from) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint self.logger) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint raise IOError('{} is not a checkpoint file'.format(filename)) OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file [07/09 11:22:39] mmcv.runner.runner INFO: load checkpoint from ./data/weneed/mask_r50/epoch_12.pth Traceback (most recent call last): File "./tools/train.py", line 169, in main() File "./tools/train.py", line 165, in main logger=logger) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 58, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/sy/Desktop/xlhuang/BalancedGroupSoftmax-master/mmdet/apis/train.py", line 204, in _dist_train runner.load_checkpoint(cfg.load_from) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/runner.py", line 234, in load_checkpoint self.logger) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 171, in load_checkpoint raise IOError('{} is not a checkpoint file'.format(filename)) OSError: ./data/weneed/mask_r50/epoch_12.pth is not a checkpoint file Traceback (most recent call last): File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 253, in main() File "/home/sy/anaconda3/envs/mmdet/lib/python3.7/site-packages/torch/distributed/launch.py", line 249, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/sy/anaconda3/envs/mmdet/bin/python', '-u', './tools/train.py', '--local_rank=1', './configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Hello, I can't download this ./data/download_models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth file right now, I dare ask if you have downloaded it

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants