Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Fix] Fix multi-node dist test #292

Merged
merged 5 commits into from
Nov 17, 2020
Merged

Conversation

dreamerlin
Copy link
Collaborator

@dreamerlin dreamerlin requested a review from innerlee October 29, 2020 03:42
@codecov
Copy link

codecov bot commented Oct 29, 2020

Codecov Report

Merging #292 (93f481c) into master (d6f76cd) will increase coverage by 1.15%.
The diff coverage is 74.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #292      +/-   ##
==========================================
+ Coverage   85.05%   86.20%   +1.15%     
==========================================
  Files          88       98      +10     
  Lines        5994     6924     +930     
  Branches      983     1119     +136     
==========================================
+ Hits         5098     5969     +871     
- Misses        709      736      +27     
- Partials      187      219      +32     
Flag Coverage Δ
unittests 86.19% <74.89%> (+1.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmaction/apis/inference.py 81.81% <ø> (ø)
mmaction/datasets/audio_dataset.py 73.68% <ø> (-10.10%) ⬇️
mmaction/datasets/audio_feature_dataset.py 73.68% <ø> (-10.10%) ⬇️
mmaction/datasets/video_dataset.py 66.66% <ø> (-15.88%) ⬇️
mmaction/localization/proposal_utils.py 100.00% <ø> (ø)
mmaction/localization/ssn_utils.py 100.00% <ø> (ø)
mmaction/models/backbones/c3d.py 91.89% <ø> (ø)
mmaction/models/backbones/resnet.py 94.83% <ø> (ø)
mmaction/models/backbones/resnet3d_slowfast.py 87.97% <ø> (ø)
mmaction/models/backbones/resnet3d_slowonly.py 100.00% <ø> (ø)
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6f76cd...5349473. Read the comment docs.

@dreamerlin dreamerlin changed the title [Fix]Fix multi-node dist test [Fix] Fix multi-node dist test Oct 30, 2020
@dreamerlin dreamerlin marked this pull request as draft October 30, 2020 12:06
@dreamerlin dreamerlin marked this pull request as ready for review October 30, 2020 13:28
@innerlee
Copy link
Contributor

can this be tested?

@dreamerlin
Copy link
Collaborator Author

can this be tested?

Since the unittest coverage is down below 85%, we will check the codecov and raise an issue to some assigners to improve the unittest. And for this PR, it has been tested successfully ; )

@innerlee
Copy link
Contributor

does the test fail without this fix?

@dreamerlin
Copy link
Collaborator Author

does the test fail without this fix?

Actually, I haven't encount failure when performing multi-node dist test. This PR is related with open-mmlab/mmdetection#4017

@innerlee
Copy link
Contributor

yeah it's better to add a test, so that we know what we have fixed.

@innerlee
Copy link
Contributor

@kennymckormick could explain more about the original problem, and why this is a partial fix?

@dreamerlin
Copy link
Collaborator Author

Unittest are added in #322

@dreamerlin dreamerlin requested a review from innerlee November 10, 2020 07:15
@innerlee
Copy link
Contributor

I noticed that #322 has been merged, and no errors was thrown. So could u give details on what bug does this pr solve?

@dreamerlin
Copy link
Collaborator Author

I noticed that #322 has been merged, and no errors was thrown. So could u give details on what bug does this pr solve?

The bug will happen when performing distributed testing and gethering the result file under tmp directory. In some cases, the tmp directory will not be shared in multi-machines.

@innerlee
Copy link
Contributor

btw where is the '.dist_test'? under mmaction/ folder?

I recommend save things in their workdir, so that we can make sure that

  • the path is writable
  • different runs wouldn't conflict

@dreamerlin
Copy link
Collaborator Author

btw where is the '.dist_test'? under mmaction/ folder?

I recommend save things in their workdir, so that we can make sure that

  • the path is writable
  • different runs wouldn't conflict

It is under mmaction2 rather than mmaction2/mmaction

@dreamerlin
Copy link
Collaborator Author

dreamerlin commented Nov 17, 2020

btw where is the '.dist_test'? under mmaction/ folder?

I recommend save things in their workdir, so that we can make sure that

  • the path is writable
  • different runs wouldn't conflict

For DistEpochEvalHook, it will specify a tmp_dir using workdir (https://github.com/open-mmlab/mmaction2/blob/master/mmaction/core/evaluation/eval_hooks.py#L261-L265);
For testing, users can specify arg.tmpdir as the tmp_dir (https://github.com/open-mmlab/mmaction2/blob/master/tools/test.py#L176-L177).

@innerlee
Copy link
Contributor

fine

@innerlee innerlee merged commit 0803be1 into open-mmlab:master Nov 17, 2020
@dreamerlin dreamerlin deleted the fix_dist branch November 17, 2020 13:53
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants