Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Improvement] Add a tool to find invalid videos. #907

Merged
merged 13 commits into from
Jun 13, 2021

Conversation

irvingzhang0512
Copy link
Contributor

@irvingzhang0512 irvingzhang0512 commented Jun 7, 2021

Motivation

Fix #893
Check all samples from a video dataset specified by the configuration file, and save file paths of invalid videos(corrupted or missing) in an output file.

Modification

Add a script in tools and update related usefull tool docs.

Use cases (Optional)

  • Use decord to decode all videos from the train set of /path/to/config.py with 5 processes, save all invalid video paths in invalid_videos.txt
python tools/check_videos.py /path/to/config.py \
    --decoder decord \
    --split train \
    --output-file invalid_videos.txt \
    --num-processes 5
  • Use opencv to decode all videos from the test set of /path/to/config.py with 10 processes, save all invalid video paths in invalid_videos.txt and remove all corrupted videos.
python tools/check_videos.py /path/to/config.py \
    --decoder opencv \
    --split test \
    --output-file invalid_videos.txt \
    --remove-corrupted-videos \
    --num-processes 10

TODO

  • Read dataset configs from config file.
  • Choose video decoder by --decoder
  • Check video by opening video file and read first, last and 3 random frames.
  • Multiprocessing
  • Generate a file list of invalid(corrupted or missing) video paths
  • Optional remove all corrupted videos.

@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #907 (26f8ebe) into master (f007661) will decrease coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #907      +/-   ##
==========================================
- Coverage   83.58%   83.53%   -0.06%     
==========================================
  Files         132      132              
  Lines        9977     9977              
  Branches     1720     1720              
==========================================
- Hits         8339     8334       -5     
- Misses       1219     1222       +3     
- Partials      419      421       +2     
Flag Coverage Δ
unittests 83.53% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmaction/core/evaluation/accuracy.py 92.27% <0.00%> (-0.91%) ⬇️
mmaction/datasets/pipelines/augmentations.py 92.41% <0.00%> (-0.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f007661...26f8ebe. Read the comment docs.

@kennymckormick
Copy link
Member

Great tool. However, I think it can still be improved from two aspects:

  1. Even if you open a video successfully with a backend, it doesn't necessarily mean you can read frames from it properly. Perhaps you need to read a frame from the video to validate.
  2. Sequentially checking each video may be very slow, can you check videos in the dataset in parallel?

@innerlee
Copy link
Contributor

innerlee commented Jun 7, 2021

Yeah try to read the first, last and random three frames

@dreamerlin dreamerlin changed the title [Improvment] Add a tool to find invalid videos. [Improvement] Add a tool to find invalid videos. Jun 9, 2021
@kennymckormick kennymckormick merged commit 4368ef3 into open-mmlab:master Jun 13, 2021
@irvingzhang0512 irvingzhang0512 deleted the check-video-tool branch June 14, 2021 16:50
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERROR cannot find video stream with wanted index: -1
3 participants