Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Fix] Fix wrong iter number and progress number in the logging during val/test time #914

Merged
merged 4 commits into from
May 13, 2021

Conversation

yezhen17
Copy link
Contributor

Refer to #903 .

@CLAassistant
Copy link

CLAassistant commented Mar 30, 2021

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #914 (22e89d7) into master (db6b054) will decrease coverage by 0.02%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #914      +/-   ##
==========================================
- Coverage   64.57%   64.55%   -0.03%     
==========================================
  Files         152      152              
  Lines        9792     9800       +8     
  Branches     1779     1781       +2     
==========================================
+ Hits         6323     6326       +3     
- Misses       3141     3145       +4     
- Partials      328      329       +1     
Flag Coverage Δ
unittests 64.55% <33.33%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmcv/engine/test.py 26.04% <0.00%> (-0.85%) ⬇️
mmcv/runner/hooks/logger/text.py 71.42% <33.33%> (-1.21%) ⬇️
mmcv/runner/hooks/evaluation.py 87.41% <100.00%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db6b054...22e89d7. Read the comment docs.

Copy link
Collaborator

@zhouzaida zhouzaida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhouzaida zhouzaida requested a review from ZwwWayne March 31, 2021 02:33
@yezhen17
Copy link
Contributor Author

yezhen17 commented Apr 3, 2021

If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set.

And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number.

Therefore, maybe discarding this [iter] is a better choice?

@yezhen17
Copy link
Contributor Author

yezhen17 commented Apr 3, 2021

Btw, tensorboard outputs are correct, not affected by this issue.

@zhouzaida
Copy link
Collaborator

If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set.

And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number.

Therefore, maybe discarding this [iter] is a better choice?

Fortunately, EvalHook will be moved to MMCV in the next version(1.3.1). Related PR #739

@yezhen17
Copy link
Contributor Author

yezhen17 commented Apr 3, 2021

If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set.
And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number.
Therefore, maybe discarding this [iter] is a better choice?

Fortunately, EvalHook will be moved to MMCV in the next version(1.3.1). Related PR #739

Oh that's good news

@yezhen17
Copy link
Contributor Author

yezhen17 commented Apr 12, 2021

Below is an example of the effect of this pr.

Before this pr:

2021-03-21 15:39:22,605 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:22,605 - mmdet - INFO - Epoch(val) [11][94] mAP_0.50: 0.2583, mAR_0.50: 0.4346
2021-03-21 15:39:28,143 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:28,143 - mmdet - INFO - Epoch(val) [11][5] vote_loss: 4.4002, objectness_loss: 0.5309, semantic_loss: 0.6248, center_loss: 0.4562, dir_class_loss: 0.0000, dir_res_loss: 0.0000, size_class_loss: 0.6253, size_res_loss: 0.5684, loss: 7.2059

After this pr:

2021-03-21 15:39:22,605 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:22,605 - mmdet - INFO - Epoch(val) [11] mAP_0.50: 0.2583, mAR_0.50: 0.4346
2021-03-21 15:39:28,143 - mmdet - INFO - Exp name: votenet_8x8_scannet-3d-18class.py
2021-03-21 15:39:28,143 - mmdet - INFO - Epoch(val) [11] vote_loss: 4.4002, objectness_loss: 0.5309, semantic_loss: 0.6248, center_loss: 0.4562, dir_class_loss: 0.0000, dir_res_loss: 0.0000, size_class_loss: 0.6253, size_res_loss: 0.5684, loss: 7.2059

@yezhen17
Copy link
Contributor Author

yezhen17 commented May 5, 2021

Modified EvalHook to provide another solution, which prints the correct iter number during eval instead of discarding it.

Example:

2021-05-06 00:10:52,335 - mmdet - INFO - Exp name: votenet_16x8_sunrgbd-3d-10class.py
2021-05-06 00:10:52,336 - mmdet - INFO - Epoch(val) [1][40]	bed_AP_0.25: 0.6004, table_AP_0.25: 0.2707, sofa_AP_0.25: 0.3081, chair_AP_0.25: 0.3745, toilet_AP_0.25: 0.5589, desk_AP_0.25: 0.0766, dresser_AP_0.25: 0.0080, night_stand_AP_0.25: 0.0025, bookshelf_AP_0.25: 0.0060, bathtub_AP_0.25: 0.1011, mAP_0.25: 0.2307, bed_rec_0.25: 0.8505, table_rec_0.25: 0.7530, sofa_rec_0.25: 0.7847, chair_rec_0.25: 0.7237, toilet_rec_0.25: 0.9448, desk_rec_0.25: 0.7067, dresser_rec_0.25: 0.5963, night_stand_rec_0.25: 0.5333, bookshelf_rec_0.25: 0.4433, bathtub_rec_0.25: 0.8571, mAR_0.25: 0.7193, bed_AP_0.50: 0.1262, table_AP_0.50: 0.0096, sofa_AP_0.50: 0.0271, chair_AP_0.50: 0.0388, toilet_AP_0.50: 0.1107, desk_AP_0.50: 0.0021, dresser_AP_0.50: 0.0000, night_stand_AP_0.50: 0.0000, bookshelf_AP_0.50: 0.0000, bathtub_AP_0.50: 0.0011, mAP_0.50: 0.0316, bed_rec_0.50: 0.2524, table_rec_0.50: 0.1120, sofa_rec_0.50: 0.1451, chair_rec_0.50: 0.1847, toilet_rec_0.50: 0.2828, desk_rec_0.50: 0.0770, dresser_rec_0.50: 0.0092, night_stand_rec_0.50: 0.0431, bookshelf_rec_0.50: 0.0142, bathtub_rec_0.50: 0.0816, mAR_0.50: 0.1202
2021-05-06 00:11:19,037 - mmdet - INFO - Exp name: votenet_16x8_sunrgbd-3d-10class.py
2021-05-06 00:11:19,039 - mmdet - INFO - Epoch(val) [1][40]	vote_loss: 5.6842, objectness_loss: 0.6418, semantic_loss: 1.0157, center_loss: 0.8679, dir_class_loss: 1.9394, dir_res_loss: 1.6326, size_class_loss: 1.0179, size_res_loss: 0.3710, loss: 13.1704

Also fixed another bug that the number on the progress bar during eval may exceed the true size of val set. This is because under distributed training, sometimes multiplying the batch size on the master gpu with number of gpus does not get the true batch size if it's the end of the dataset.

Before fix:

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5056/5050, 127.8 task/s, elapsed: 30s, ETA:     0s

After fix:

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5050/5050, 127.8 task/s, elapsed: 30s, ETA:     0s

@yezhen17 yezhen17 requested review from ZwwWayne and zhouzaida May 7, 2021 07:35
@yezhen17 yezhen17 changed the title [Fix] Remove iter number in logging during val/test time [Fix] Fix wrong iter number and progress number in the logging during val/test time May 7, 2021
@ZwwWayne
Copy link
Collaborator

ZwwWayne commented May 7, 2021

LGTM, see if others have any comments.

@ZwwWayne ZwwWayne requested a review from dreamerlin May 10, 2021 14:29
@ZwwWayne ZwwWayne mentioned this pull request May 10, 2021
13 tasks
@ZwwWayne ZwwWayne merged commit b36c4de into open-mmlab:master May 13, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants