Error when I use resume_from #875

tarv33 · 2021-05-19T01:14:51Z

Once I use 'resume_from' in config, mmaction2 will report error.

2021-05-19 01:08:48,176 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.7 (default, May  7 2020, 21:25:33) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3: Tesla V100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.7.0
OpenCV: 4.5.1
MMCV: 1.3.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMAction2: 0.13.0+6a252b8
------------------------------------------------------------

2021-05-19 01:08:48,176 - mmaction - INFO - Distributed training: False
2021-05-19 01:08:48,176 - mmaction - INFO - Config: /mmaction2/configs/recognition/tin/tin_r50_1x1x8_40e_sthv2_rgb.py
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]


# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNetTIN',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False,
        shift_div=4),
    cls_head=dict(
        type='TSMHead',
        num_classes=174,
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.8,
        init_std=0.001,
        is_shift=False),
    # model training and testing settings
    train_cfg=None,
    test_cfg=dict(average_clips=None))


# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/sthv2/rawframes'
data_root_val = 'data/sthv2/rawframes'
ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=8,
        test_mode=True),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=8,
        test_mode=True),
    dict(type='RawFrameDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=12, #6,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
evaluation = dict(
    interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy'])

# optimizer
optimizer = dict(
    type='SGD',
    constructor='TSMOptimizerConstructor',
    paramwise_cfg=dict(fc_lr5=True),
    lr=0.02,  # this lr is used for 8 gpus
    momentum=0.9,
    weight_decay=0.0005)
optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2))
# learning policy
lr_config = dict(
    policy='CosineAnnealing',
    by_epoch=False,
    warmup='linear',
    warmup_iters=1,
    warmup_by_epoch=True,
    min_lr=0)
total_epochs = 40

# runtime settings
work_dir = './work_dirs/tin_r50_1x1x8_40e_sthv2_rgb/'
resume_from = './work_dirs/tin_r50_1x1x8_40e_sthv2_rgb/epoch_29.pth'

Use load_from_torchvision loader
2021-05-19 01:08:49,657 - mmaction - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'}
2021-05-19 01:08:55,979 - mmaction - INFO - load checkpoint from ./work_dirs/tin_r50_1x1x8_40e_sthv2_rgb/epoch_29.pth
2021-05-19 01:08:55,979 - mmaction - INFO - Use load_from_local loader
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/config.py", line 96, in _validate_py_syntax
    ast.parse(content)
  File "/opt/conda/lib/python3.7/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    /mmaction2/configs/_base_/models/tin_r50.py
    ^
SyntaxError: invalid syntax

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 197, in <module>
    main()
  File "tools/train.py", line 193, in main
    meta=meta)
  File "/mmaction2/mmaction/apis/train.py", line 157, in train_model
    runner.resume(cfg.resume_from)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 347, in resume
    checkpoint['meta']['config'], file_format='.py')
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/config.py", line 279, in fromstring
    cfg = Config.fromfile(temp_file.name)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/config.py", line 252, in fromfile
    use_predefined_variables)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/config.py", line 145, in _file2dict
    Config._validate_py_syntax(filename)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/config.py", line 98, in _validate_py_syntax
    raise SyntaxError('There are syntax errors in config '
SyntaxError: There are syntax errors in config file /tmp/tmpkq2pesx0.py: invalid syntax (<unknown>, line 1)
Exception ignored in: <function _TemporaryFileCloser.__del__ at 0x7f1f26907830>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/tempfile.py", line 448, in __del__
    self.close()
  File "/opt/conda/lib/python3.7/tempfile.py", line 444, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpenshl8j0/tmp0er7xxxe.py'

innerlee · 2021-05-19T01:40:38Z

This is fixed in #820

tarv33 · 2021-05-19T02:41:23Z

Thanks

innerlee added the duplicate This issue or pull request already exists label May 19, 2021

tarv33 closed this as completed May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when I use resume_from #875

Error when I use resume_from #875

tarv33 commented May 19, 2021 •

edited by innerlee

Loading

innerlee commented May 19, 2021

tarv33 commented May 19, 2021

Error when I use resume_from #875

Error when I use resume_from #875

Comments

tarv33 commented May 19, 2021 • edited by innerlee Loading

innerlee commented May 19, 2021

tarv33 commented May 19, 2021

tarv33 commented May 19, 2021 •

edited by innerlee

Loading