[Fix] Fix wrong iter number and progress number in the logging during val/test time #914

yezhen17 · 2021-03-30T06:19:34Z

Refer to #903 .

CLAassistant · 2021-03-30T06:19:37Z

All committers have signed the CLA.

codecov · 2021-03-30T07:00:36Z

Codecov Report

Merging #914 (22e89d7) into master (db6b054) will decrease coverage by 0.02%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master     #914      +/-   ##
==========================================
- Coverage   64.57%   64.55%   -0.03%     
==========================================
  Files         152      152              
  Lines        9792     9800       +8     
  Branches     1779     1781       +2     
==========================================
+ Hits         6323     6326       +3     
- Misses       3141     3145       +4     
- Partials      328      329       +1

Flag	Coverage Δ
unittests	`64.55% <33.33%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcv/engine/test.py	`26.04% <0.00%> (-0.85%)`	⬇️
mmcv/runner/hooks/logger/text.py	`71.42% <33.33%> (-1.21%)`	⬇️
mmcv/runner/hooks/evaluation.py	`87.41% <100.00%> (+0.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db6b054...22e89d7. Read the comment docs.

zhouzaida

LGTM

yezhen17 · 2021-04-03T08:38:39Z

If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set.

And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number.

Therefore, maybe discarding this [iter] is a better choice?

yezhen17 · 2021-04-03T08:41:15Z

Btw, tensorboard outputs are correct, not affected by this issue.

zhouzaida · 2021-04-03T08:47:33Z

If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set.

And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number.

Therefore, maybe discarding this [iter] is a better choice?

Fortunately, EvalHook will be moved to MMCV in the next version(1.3.1). Related PR #739

yezhen17 · 2021-04-03T08:49:05Z

If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set.
And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number.
Therefore, maybe discarding this [iter] is a better choice?

Fortunately, EvalHook will be moved to MMCV in the next version(1.3.1). Related PR #739

Oh that's good news

yezhen17 · 2021-04-12T16:30:52Z

Below is an example of the effect of this pr.

Before this pr:

2021-03-21 15:39:22,605 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:22,605 - mmdet - INFO - Epoch(val) [11][94] mAP_0.50: 0.2583, mAR_0.50: 0.4346
2021-03-21 15:39:28,143 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:28,143 - mmdet - INFO - Epoch(val) [11][5] vote_loss: 4.4002, objectness_loss: 0.5309, semantic_loss: 0.6248, center_loss: 0.4562, dir_class_loss: 0.0000, dir_res_loss: 0.0000, size_class_loss: 0.6253, size_res_loss: 0.5684, loss: 7.2059

After this pr:

2021-03-21 15:39:22,605 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:22,605 - mmdet - INFO - Epoch(val) [11] mAP_0.50: 0.2583, mAR_0.50: 0.4346
2021-03-21 15:39:28,143 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:28,143 - mmdet - INFO - Epoch(val) [11] vote_loss: 4.4002, objectness_loss: 0.5309, semantic_loss: 0.6248, center_loss: 0.4562, dir_class_loss: 0.0000, dir_res_loss: 0.0000, size_class_loss: 0.6253, size_res_loss: 0.5684, loss: 7.2059

yezhen17 · 2021-05-05T17:37:22Z

Modified EvalHook to provide another solution, which prints the correct iter number during eval instead of discarding it.

Example:

2021-05-06 00:10:52,335 - mmdet - INFO - Exp name: votenet_16x8_sunrgbd-3d-10class.py
2021-05-06 00:10:52,336 - mmdet - INFO - Epoch(val) [1][40]	bed_AP_0.25: 0.6004, table_AP_0.25: 0.2707, sofa_AP_0.25: 0.3081, chair_AP_0.25: 0.3745, toilet_AP_0.25: 0.5589, desk_AP_0.25: 0.0766, dresser_AP_0.25: 0.0080, night_stand_AP_0.25: 0.0025, bookshelf_AP_0.25: 0.0060, bathtub_AP_0.25: 0.1011, mAP_0.25: 0.2307, bed_rec_0.25: 0.8505, table_rec_0.25: 0.7530, sofa_rec_0.25: 0.7847, chair_rec_0.25: 0.7237, toilet_rec_0.25: 0.9448, desk_rec_0.25: 0.7067, dresser_rec_0.25: 0.5963, night_stand_rec_0.25: 0.5333, bookshelf_rec_0.25: 0.4433, bathtub_rec_0.25: 0.8571, mAR_0.25: 0.7193, bed_AP_0.50: 0.1262, table_AP_0.50: 0.0096, sofa_AP_0.50: 0.0271, chair_AP_0.50: 0.0388, toilet_AP_0.50: 0.1107, desk_AP_0.50: 0.0021, dresser_AP_0.50: 0.0000, night_stand_AP_0.50: 0.0000, bookshelf_AP_0.50: 0.0000, bathtub_AP_0.50: 0.0011, mAP_0.50: 0.0316, bed_rec_0.50: 0.2524, table_rec_0.50: 0.1120, sofa_rec_0.50: 0.1451, chair_rec_0.50: 0.1847, toilet_rec_0.50: 0.2828, desk_rec_0.50: 0.0770, dresser_rec_0.50: 0.0092, night_stand_rec_0.50: 0.0431, bookshelf_rec_0.50: 0.0142, bathtub_rec_0.50: 0.0816, mAR_0.50: 0.1202
2021-05-06 00:11:19,037 - mmdet - INFO - Exp name: votenet_16x8_sunrgbd-3d-10class.py
2021-05-06 00:11:19,039 - mmdet - INFO - Epoch(val) [1][40]	vote_loss: 5.6842, objectness_loss: 0.6418, semantic_loss: 1.0157, center_loss: 0.8679, dir_class_loss: 1.9394, dir_res_loss: 1.6326, size_class_loss: 1.0179, size_res_loss: 0.3710, loss: 13.1704

Also fixed another bug that the number on the progress bar during eval may exceed the true size of val set. This is because under distributed training, sometimes multiplying the batch size on the master gpu with number of gpus does not get the true batch size if it's the end of the dataset.

Before fix:

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5056/5050, 127.8 task/s, elapsed: 30s, ETA:     0s

After fix:

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5050/5050, 127.8 task/s, elapsed: 30s, ETA:     0s

ZwwWayne · 2021-05-07T07:43:17Z

LGTM, see if others have any comments.

Remove iter number in logging during val/test time

7c29f1c

Fix typo

4577cd0

zhouzaida approved these changes Mar 30, 2021

View reviewed changes

zhouzaida requested a review from ZwwWayne March 31, 2021 02:33

ZwwWayne approved these changes Mar 31, 2021

View reviewed changes

dreamerlin approved these changes Apr 12, 2021

View reviewed changes

yezhen17 added 2 commits May 2, 2021 17:58

Merge branch 'master_fork' into logging#903

53bd40a

modified EvalHook for eval mode to print the correct iter number

22e89d7

yezhen17 requested review from ZwwWayne and zhouzaida May 7, 2021 07:35

yezhen17 changed the title ~~[Fix] Remove iter number in logging during val/test time~~ [Fix] Fix wrong iter number and progress number in the logging during val/test time May 7, 2021

ZwwWayne requested a review from dreamerlin May 10, 2021 14:29

ZwwWayne mentioned this pull request May 10, 2021

Iteration Plan v1.3.4 - May 2021 #998

Closed

13 tasks

ZwwWayne approved these changes May 13, 2021

View reviewed changes

ZwwWayne merged commit b36c4de into open-mmlab:master May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix wrong iter number and progress number in the logging during val/test time #914

[Fix] Fix wrong iter number and progress number in the logging during val/test time #914

yezhen17 commented Mar 30, 2021

CLAassistant commented Mar 30, 2021 •

edited

Loading

codecov bot commented Mar 30, 2021 •

edited

Loading

zhouzaida left a comment

yezhen17 commented Apr 3, 2021 •

edited

Loading

yezhen17 commented Apr 3, 2021

zhouzaida commented Apr 3, 2021

yezhen17 commented Apr 3, 2021

yezhen17 commented Apr 12, 2021 •

edited

Loading

yezhen17 commented May 5, 2021

ZwwWayne commented May 7, 2021

[Fix] Fix wrong iter number and progress number in the logging during val/test time #914

[Fix] Fix wrong iter number and progress number in the logging during val/test time #914

Conversation

yezhen17 commented Mar 30, 2021

CLAassistant commented Mar 30, 2021 • edited Loading

codecov bot commented Mar 30, 2021 • edited Loading

Codecov Report

zhouzaida left a comment

Choose a reason for hiding this comment

yezhen17 commented Apr 3, 2021 • edited Loading

yezhen17 commented Apr 3, 2021

zhouzaida commented Apr 3, 2021

yezhen17 commented Apr 3, 2021

yezhen17 commented Apr 12, 2021 • edited Loading

yezhen17 commented May 5, 2021

ZwwWayne commented May 7, 2021

CLAassistant commented Mar 30, 2021 •

edited

Loading

codecov bot commented Mar 30, 2021 •

edited

Loading

yezhen17 commented Apr 3, 2021 •

edited

Loading

yezhen17 commented Apr 12, 2021 •

edited

Loading