Highlights
- Support PyTorch 1.9
- Support Pytorchvideo Transforms
- Support PreciseBN
New Features
Improvements
- Remove redundant augmentations in config files (#996)
- Make resource directory to hold common resource pictures (#1011)
- Remove deperecated FrameSelector (#1010)
- Support Concat Dataset (#1000)
- Add
to-mp4
option to resize_videos.py (#1021) - Add option to keep tail frames (#1050)
- Update MIM support (#1061)
- Calculate Top-K accurate and inaccurate classes (#1047)
Bug and Typo Fixes
- Fix bug in PoseC3D demo (#1009)
- Fix some problems in resize_videos.py (#1012)
- Support torch1.9 (#1015)
- Remove redundant code in CI (#1046)
- Fix bug about persistent_workers (#1044)
- Support TimeSformer feature extraction (#1035)
- Fix ColorJitter (#1025)
ModelZoo
- Add TSM-R50 sthv1 models trained by PytorchVideo RandAugment and AugMix (#1008)
- Update SlowOnly SthV1 checkpoints (#1034)
- Add SlowOnly Kinetics400 checkpoints trained with Precise-BN (#1038)
- Add CSN-R50 from scratch checkpoints (#1045)
- TPN Kinetics-400 Checkpoints trained with the new ColorJitter (#1025)
Documentation
- Add Chinese translation of feature_extraction.md (#1020)
- Fix the code snippet in getting_started.md (#1023)
- Fix TANet config table (#1028)
- Add description to PoseC3D dataset (#1053)
Highlights
- Support using backbone from pytorch-image-models(timm)
- Support PIMS Decoder
- Demo for skeleton-based action recognition
- Support Timesformer
New Features
- Support using backbones from pytorch-image-models(timm) for TSN (#880)
- Support torchvision transformations in preprocessing pipelines (#972)
- Demo for skeleton-based action recognition (#972)
- Support Timesformer (#839)
Improvements
- Add a tool to find invalid videos (#907, #950)
- Add an option to specify spectrogram_type (#909)
- Add json output to video demo (#906)
- Add MIM related docs (#918)
- Rename lr to scheduler (#916)
- Support
--cfg-options
for demos (#911) - Support number counting for flow-wise filename template (#922)
- Add Chinese tutorial (#941)
- Change ResNet3D default values (#939)
- Adjust script structure (#935)
- Add font color to args in long_video_demo (#947)
- Polish code style with Pylint (#908)
- Support PIMS Decoder (#946)
- Improve Metafiles (#956, #979, #966)
- Add links to download Kinetics400 validation (#920)
- Audit the usage of shutil.rmtree (#943)
- Polish localizer related codes(#913)
Bug and Typo Fixes
- Fix spatiotemporal detection demo (#899)
- Fix docstring for 3D inflate (#925)
- Fix bug of writing text to video with TextClip (#952)
- Fix mmcv install in CI (#977)
ModelZoo
- Add TSN with Swin Transformer backbone as an example for using pytorch-image-models(timm) backbones (#880)
- Port CSN checkpoints from VMZ (#945)
- Release various checkpoints for UCF101, HMDB51 and Sthv1 (#938)
- Support Timesformer (#839)
- Update TSM modelzoo (#981)
Highlights
- Support PoseC3D
- Support ACRN
- Support MIM
New Features
- Support PoseC3D (#786, #890)
- Support MIM (#870)
- Support ACRN and Focal Loss (#891)
- Support Jester dataset (#864)
Improvements
- Add
metric_options
for evaluation to docs (#873) - Support creating a new label map based on custom classes for demos about spatio temporal demo (#879)
- Improve document about AVA dataset preparation (#878)
- Provide a script to extract clip-level feature (#856)
Bug and Typo Fixes
- Fix issues about resume (#877, #878)
- Correct the key name of
eval_results
dictionary for metric 'mmit_mean_average_precision' (#885)
ModelZoo
Highlights
- Support TRN
- Support Diving48
New Features
- Support TRN (#755)
- Support Diving48 (#835)
- Support Webcam Demo for Spatio-temporal Action Detection Models (#795)
Improvements
- Add softmax option for pytorch2onnx tool (#781)
- Support TRN (#755)
- Test with onnx models and TensorRT engines (#758)
- Speed up AVA Testing (#784)
- Add
self.with_neck
attribute (#796) - Update installation document (#798)
- Use a random master port (#809)
- Update AVA processing data document (#801)
- Refactor spatio-temporal augmentation (#782)
- Add QR code in CN README (#812)
- Add Alternative way to download Kinetics (#817, #822)
- Refactor Sampler (#790)
- Use EvalHook in MMCV with backward compatibility (#793)
- Use MMCV Model Registry (#843)
Bug and Typo Fixes
- Fix a bug in pytorch2onnx.py when
num_classes <= 4
(#800, #824) - Fix
demo_spatiotemporal_det.py
error (#803, #805) - Fix loading config bugs when resume (#820)
- Make HMDB51 annotation generation more robust (#811)
ModelZoo
Highlights
- Support LFB
- Support using backbone from MMCls/TorchVision
- Add Chinese documentation
New Features
- Support LFB (#553)
- Support using backbones from MMCls for TSN (#679)
- Support using backbones from TorchVision for TSN (#720)
- Support Mixup and Cutmix for recognizers (#681)
- Support Chinese documentation (#665, #680, #689, #701, #702, #703, #706, #716, #717, #731, #733, #735, #736, #737, #738, #739, #740, #742, #752, #759, #761, #772, #775)
Improvements
- Add slowfast config/json/log/ckpt for training custom classes of AVA (#678)
- Set RandAugment as Imgaug default transforms (#585)
- Add
--test-last
&--test-best
fortools/train.py
to test checkpoints after training (#608) - Add fcn_testing in TPN (#684)
- Remove redundant recall functions (#741)
- Recursively remove pretrained step for testing (#695)
- Improve demo by limiting inference fps (#668)
Bug and Typo Fixes
- Fix a bug about multi-class in VideoDataset (#723)
- Reverse key-value in anet filelist generation (#686)
- Fix flow norm cfg typo (#693)
ModelZoo
- Add LFB for AVA2.1 (#553)
- Add TSN with ResNeXt-101-32x4d backbone as an example for using MMCls backbones (#679)
- Add TSN with Densenet161 backbone as an example for using TorchVision backbones (#720)
- Add slowonly_nl_embedded_gaussian_r50_4x16x1_150e_kinetics400_rgb (#690)
- Add slowonly_nl_embedded_gaussian_r50_8x8x1_150e_kinetics400_rgb (#704)
- Add slowonly_nl_kinetics_pretrained_r50_4x16x1(8x8x1)_20e_ava_rgb (#730)
Highlights
- Support TSM-MobileNetV2
- Support TANet
- Support GPU Normalize
New Features
- Support TSM-MobileNetV2 (#415)
- Support flip with label mapping (#591)
- Add seed option for sampler (#642)
- Support GPU Normalize (#586)
- Support TANet (#595)
Improvements
- Training custom classes of ava dataset (#555)
- Add CN README in homepage (#592, #594)
- Support soft label for CrossEntropyLoss (#625)
- Refactor config: Specify
train_cfg
andtest_cfg
inmodel
(#629) - Provide an alternative way to download older kinetics annotations (#597)
- Update FAQ for
- Modify default value of
save_best
(#600) - Use BibTex rather than latex in markdown (#607)
- Add warnings of uninstalling mmdet and supplementary documents (#624)
- Support soft label for CrossEntropyLoss (#625)
Bug and Typo Fixes
ModelZoo
Highlights
- Support imgaug
- Support spatial temporal demo
- Refactor EvalHook, config structure, unittest structure
New Features
- Support imgaug for augmentations in the data pipeline (#492)
- Support setting
max_testing_views
for extremely large models to save GPU memory used (#511) - Add spatial temporal demo (#547, #566)
Improvements
- Refactor EvalHook (#395)
- Refactor AVA hook (#567)
- Add repo citation (#545)
- Add dataset size of Kinetics400 (#503)
- Add lazy operation docs (#504)
- Add class_weight for CrossEntropyLoss and BCELossWithLogits (#509)
- add some explanation about the resampling in slowfast (#502)
- Modify paper title in README.md (#512)
- Add alternative ways to download Kinetics (#521)
- Add OpenMMLab projects link in README (#530)
- Change default preprocessing to shortedge to 256 (#538)
- Add config tag in dataset README (#540)
- Add solution for markdownlint installation issue (#497)
- Add dataset overview in readthedocs (#548)
- Modify the trigger mode of the warnings of missing mmdet (#583)
- Refactor config structure (#488, #572)
- Refactor unittest structure (#433)
Bug and Typo Fixes
- Fix a bug about ava dataset validation (#527)
- Fix a bug about ResNet pretrain weight initialization (#582)
- Fix a bug in CI due to MMCV index (#495)
- Remove invalid links of MiT and MMiT (#516)
- Fix frame rate bug for AVA preparation (#576)
ModelZoo
Highlights
- Support Spatio-Temporal Action Detection (AVA)
- Support precise BN
New Features
- Support precise BN (#501)
- Support Spatio-Temporal Action Detection (AVA) (#351)
- Support to return feature maps in
inference_recognizer
(#458)
Improvements
- Add arg
stride
to long_video_demo.py, to make inference faster (#468) - Support training and testing for Spatio-Temporal Action Detection (#351)
- Fix CI due to pip upgrade (#454)
- Add markdown lint in pre-commit hook (#255)
- Speed up confusion matrix calculation (#465)
- Use title case in modelzoo statistics (#456)
- Add FAQ documents for easy troubleshooting. (#413, #420, #439)
- Support Spatio-Temporal Action Detection with context (#471)
- Add class weight for CrossEntropyLoss and BCELossWithLogits (#509)
- Add Lazy OPs docs (#504)
Bug and Typo Fixes
- Fix typo in default argument of BaseHead (#446)
- Fix potential bug about
output_config
overwrite (#463)
ModelZoo
- Add SlowOnly, SlowFast for AVA2.1 (#351)
Highlights
- Support GradCAM utils for recognizers
- Support ResNet Audio model
New Features
- Automatically add modelzoo statistics to readthedocs (#327)
- Support GYM99 (#331, #336)
- Add AudioOnly Pathway from AVSlowFast. (#355)
- Add GradCAM utils for recognizer (#324)
- Add print config script (#345)
- Add online motion vector decoder (#291)
Improvements
- Support PyTorch 1.7 in CI (#312)
- Support to predict different labels in a long video (#274)
- Update docs bout test crops (#359)
- Polish code format using pylint manually (#338)
- Update unittest coverage (#358, #322, #325)
- Add random seed for building filelists (#323)
- Update colab tutorial (#367)
- set default batch_size of evaluation and testing to 1 (#250)
- Rename the preparation docs to
README.md
(#388) - Move docs about demo to
demo/README.md
(#329) - Remove redundant code in
tools/test.py
(#310) - Automatically calculate number of test clips for Recognizer2D (#359)
Bug and Typo Fixes
- Fix rename Kinetics classnames bug (#384)
- Fix a bug in BaseDataset when
data_prefix
is None (#314) - Fix a bug about
tmp_folder
inOpenCVInit
(#357) - Fix
get_thread_id
when not using disk as backend (#354, #357) - Fix the bug of HVU object
num_classes
from 1679 to 1678 (#307) - Fix typo in
export_model.md
(#399) - Fix OmniSource training configs (#321)
- Fix Issue #306: Bug of SampleAVAFrames (#317)
ModelZoo
- Add SlowOnly model for GYM99, both RGB and Flow (#336)
- Add auto modelzoo statistics in readthedocs (#327)
- Add TSN for HMDB51 pretrained on Kinetics400, Moments in Time and ImageNet. (#372)
Highlights
- Support OmniSource
- Support C3D
- Support video recognition with audio modality
- Support HVU
- Support X3D
New Features
- Support AVA dataset preparation (#266)
- Support the training of video recognition dataset with multiple tag categories (#235)
- Support joint training with multiple training datasets of multiple formats, including images, untrimmed videos, etc. (#242)
- Support to specify a start epoch to conduct evaluation (#216)
- Implement X3D models, support testing with model weights converted from SlowFast (#288)
- Support specify a start epoch to conduct evaluation (#216)
Improvements
- Set default values of 'average_clips' in each config file so that there is no need to set it explicitly during testing in most cases (#232)
- Extend HVU datatools to generate individual file list for each tag category (#258)
- Support data preparation for Kinetics-600 and Kinetics-700 (#254)
- Use
metric_dict
to replace hardcoded arguments inevaluate
function (#286) - Add
cfg-options
in arguments to override some settings in the used config for convenience (#212) - Rename the old evaluating protocol
mean_average_precision
asmmit_mean_average_precision
since it is only used on MMIT and is not themAP
we usually talk about. Addmean_average_precision
, which is the realmAP
(#235) - Add accurate setting (Three crop * 2 clip) and report corresponding performance for TSM model (#241)
- Add citations in each preparing_dataset.md in
tools/data/dataset
(#289) - Update the performance of audio-visual fusion on Kinetics-400 (#281)
- Support data preparation of OmniSource web datasets, including GoogleImage, InsImage, InsVideo and KineticsRawVideo (#294)
- Use
metric_options
dict to provide metric args inevaluate
(#286)
Bug Fixes
- Register
FrameSelector
inPIPELINES
(#268) - Fix the potential bug for default value in dataset_setting (#245)
- Fix multi-node dist test (#292)
- Fix the data preparation bug for
something-something
dataset (#278) - Fix the invalid config url in slowonly README data benchmark (#249)
- Validate that the performance of models trained with videos have no significant difference comparing to the performance of models trained with rawframes (#256)
- Correct the
img_norm_cfg
used by TSN-3seg-R50 UCF-101 model, improve the Top-1 accuracy by 3% (#273)
ModelZoo
- Add Baselines for Kinetics-600 and Kinetics-700, including TSN-R50-8seg and SlowOnly-R50-8x8 (#259)
- Add OmniSource benchmark on MiniKineitcs (#296)
- Add Baselines for HVU, including TSN-R18-8seg on 6 tag categories of HVU (#287)
- Add X3D models ported from SlowFast (#288)
Highlights
- Support TPN
- Support JHMDB, UCF101-24, HVU dataset preparation
- support onnx model conversion
New Features
- Support the data pre-processing pipeline for the HVU Dataset (#277)
- Support real-time action recognition from web camera (#171)
- Support onnx (#160)
- Support UCF101-24 preparation (#219)
- Support evaluating mAP for ActivityNet with CUHK17_activitynet_pred (#176)
- Add the data pipeline for ActivityNet, including downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature (#190)
- Support JHMDB preparation (#220)
ModelZoo
- Add finetuning setting for SlowOnly (#173)
- Add TSN and SlowOnly models trained with OmniSource, which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8 (#215)
Improvements
- Support demo with video url (#165)
- Support multi-batch when testing (#184)
- Add tutorial for adding a new learning rate updater (#181)
- Add config name in meta info (#183)
- Remove git hash in
__version__
(#189) - Check mmcv version (#189)
- Update url with 'https://download.openmmlab.com' (#208)
- Update Docker file to support PyTorch 1.6 and update
install.md
(#209) - Polish readsthedocs display (#217, #229)
Bug Fixes
- Fix the bug when using OpenCV to extract only RGB frames with original shape (#184)
- Fix the bug of sthv2
num_classes
from 339 to 174 (#174, #207)
Highlights
- Support TIN, CSN, SSN, NonLocal
- Support FP16 training
New Features
- Support NonLocal module and provide ckpt in TSM and I3D (#41)
- Support SSN (#33, #37, #52, #55)
- Support CSN (#87)
- Support TIN (#53)
- Support HMDB51 dataset preparation (#60)
- Support encoding videos from frames (#84)
- Support FP16 training (#25)
- Enhance demo by supporting rawframe inference (#59), output video/gif (#72)
ModelZoo
- Update Slowfast modelzoo (#51)
- Update TSN, TSM video checkpoints (#50)
- Add data benchmark for TSN (#57)
- Add data benchmark for SlowOnly (#77)
- Add BSN/BMN performance results with feature extracted by our codebase (#99)
Improvements
- Polish data preparation codes (#70)
- Improve data preparation scripts (#58)
- Improve unittest coverage and minor fix (#62)
- Support PyTorch 1.6 in CI (#117)
- Support
with_offset
for rawframe dataset (#48) - Support json annotation files (#119)
- Support
multi-class
in TSMHead (#104) - Support using
val_step()
to validate data for eachval
workflow (#123) - Use
xxInit()
method to gettotal_frames
and maketotal_frames
a required key (#90) - Add paper introduction in model readme (#140)
- Adjust the directory structure of
tools/
and rename some scripts files (#142)
Bug Fixes
- Fix configs for localization test (#67)
- Fix configs of SlowOnly by fixing lr to 8 gpus (#136)
- Fix the bug in analyze_log (#54)
- Fix the bug of generating HMDB51 class index file (#69)
- Fix the bug of using
load_checkpoint()
in ResNet (#93) - Fix the bug of
--work-dir
when using slurm training script (#110) - Correct the sthv1/sthv2 rawframes filelist generate command (#71)
CosineAnnealing
typo (#47)
Highlights
- MMAction2 is released
New Features
- Support various datasets: UCF101, Kinetics-400, Something-Something V1&V2, Moments in Time, Multi-Moments in Time, THUMOS14
- Support various action recognition methods: TSN, TSM, R(2+1)D, I3D, SlowOnly, SlowFast, Non-local
- Support various action localization methods: BSN, BMN
- Colab demo for action recognition