This repo holds the code and the models for MUSES, introduced in the paper:
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu, Yao Hu, Song Bai, Fei Ding, Xiang Bai, Philip H.S. Torr
CVPR 2021.
MUSES is a large-scale video dataset, designed to spur researches on a new task called multi-shot temporal event localization. We present a baseline aproach (denoted as MUSES-Net) that achieves SOTA performance on MUSES. It also reports an mAP of 56.9% on THUMOS14 at IoU=0.5.
The code largely borrows from SSN and P-GCN. Thanks for their great work!
Find more resouces (e.g. annotation file, source videos) on our project page.
[2022.3.19] Add support for the MUSES dataset. The proposals, models, source videos of the MUSES dataset are released. Stay tuned for MUSES v2, which includes videos from more countries.
[2021.6.19] Code and the annotation file of MUSES are released. Please find the annotation file on our project page.
The code is based on PyTorch. The following environment is required.
- Python 3
- PyTorch >= 1.3.0
- CUDA >= 9.2
Other minor Python modules can be installed by running
pip install -r requirements.txt
The code relies on CUDA extensions. Build them with the following command:
python setup.py develop
After installing all dependecies, run python demo.py
for a quick test.
We support experimenting with THUMOS14 and MUSES. The video features, the proposals and the reference models are provided on OneDrive.
-
THUMOS14: The features and the proposals are the same as thosed used by PGCN. Extract the archive
thumos_i3d_features.tar
and put the features indata/thumos14
folder. The proposal files are already contained in the repository. We expect the following structure in this folder.- data - thumos14 - I3D_RGB - I3D_Flow
-
MUSES: Extract the archives of features and proposal files.
# The archive does not have a directory structure # We need to create one mkdir -p data/muses/muses_i3d_features tar -xf muses_i3d_features.tar -C data/muses/muses_i3d_features tar -xf muses_proposals.tar -C data/muses
We expect the following structure in this folder.
- data - muses - muses_i3d_features - muses_test_proposal_list.txt - ...
You can also specify the path to the features/proposals in the config files data/cfgs/*.yml
.
Put the reference_models
folder in the root directory of this code:
- reference_models
- muses.pth.tar
- thumos14_flow.pth.tar
- thumos14_rgb.pth.tar
You can test the reference models by running a single script
bash scripts/test_reference_models.sh DATASET
Here DATASET
should be thumos14
or muses
.
Using these models, you should get the following performance
0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Average | |
---|---|---|---|---|---|---|
mAP | 26.5 | 23.1 | 19.7 | 14.8 | 9.5 | 18.7 |
Note: We re-train the network on MUSES and the performance is higher than that reported in the paper.
Modality | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Average |
---|---|---|---|---|---|---|
RGB | 60.14 | 54.93 | 46.38 | 34.96 | 21.69 | 43.62 |
Flow | 64.64 | 60.29 | 53.93 | 42.84 | 29.70 | 50.28 |
R+F | 68.93 | 63.99 | 56.85 | 46.25 | 30.97 | 53.40 |
The testing process consists of two steps, detailed below.
- Extract detection scores for all the proposals by running
python test_net.py DATASET CHECKPOINT_PATH RESULT_PICKLE --cfg CFG_PATH
Here, RESULT_PICKLE is the path where we save the detection scores. CFG_PATH is the path of config file, e.g. data/cfgs/thumos14_flow.yml
.
- Evaluate the detection performance by running
python eval.py DATASET RESULT_PICKLE --cfg CFG_PATH
On THUMOS14, we need to fuse the detection scores with RGB and Flow modality. This can be done by running
python eval.py DATASET RESULT_PICKLE_RGB RESULT_PICKLE_FLOW --cfg CFG_PATH --score_weights 1 1.2 --cfg CFG_PATH_RGB
Train your own models with the following command
python train_net.py DATASET --cfg CFG_PATH --snapshot_pref SNAPSHOT_PREF --epochs MAX_EPOCHS
SNAPSHOT_PREF: the path to save trained models and logs, e.g outputs/snapshpts/thumos14_rgb/
.
We provide a script that finishes all steps, including training, testing, and two-stream fusion. Run the script with the following command
bash scripts/do_all.sh DATASET
Note: The results may vary in different runs and differs from those of the reference models. It is encouraged to use the average mAP as the primary metric. It is more stable than mAP@0.5.
Please cite the following paper if you feel MUSES useful to your research
@InProceedings{Liu_2021_CVPR,
author = {Liu, Xiaolong and Hu, Yao and Bai, Song and Ding, Fei and Bai, Xiang and Torr, Philip H. S.},
title = {Multi-Shot Temporal Event Localization: A Benchmark},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {12596-12606}
}
- TadTR: Efficient temporal action detectioon (localization) with Transformer.
For questions and suggestions, file an issue or contact Xiaolong Liu at "liuxl at hust dot edu dot cn".