-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Fix] Fix wrong iter number and progress number in the logging during val/test time #914
Conversation
Codecov Report
@@ Coverage Diff @@
## master #914 +/- ##
==========================================
- Coverage 64.57% 64.55% -0.03%
==========================================
Files 152 152
Lines 9792 9800 +8
Branches 1779 1781 +2
==========================================
+ Hits 6323 6326 +3
- Misses 3141 3145 +4
- Partials 328 329 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If we keep the [iter] during evaluation, we still cannot use this to indicate the size of the val set (it's actually the size of the val set / batch size), and the tqdm progress bar during evaluation already tells the user the size of the val set. And, since evaluation hook is not in MMCV but in downstream libraries such as MMDetection, we cannot let [iter] show the exact information we want without modifying downstream libraries. For example, if we want [iter] during evaluation to print the size of the val set / batch size in MMDetection, we have to modify the evaluation hook here by updating the iteration number. Therefore, maybe discarding this [iter] is a better choice? |
Btw, tensorboard outputs are correct, not affected by this issue. |
Fortunately, EvalHook will be moved to MMCV in the next version(1.3.1). Related PR #739 |
Oh that's good news |
Below is an example of the effect of this pr. Before this pr:
After this pr:
|
Modified EvalHook to provide another solution, which prints the correct iter number during eval instead of discarding it. Example:
Also fixed another bug that the number on the progress bar during eval may exceed the true size of val set. This is because under distributed training, sometimes multiplying the batch size on the master gpu with number of gpus does not get the true batch size if it's the end of the dataset. Before fix:
After fix:
|
LGTM, see if others have any comments. |
Refer to #903 .